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The present application claims priority to provisional application U.S. Serial No. 
60/235,557, filed September 27, 2000 (Atty. Docket CL000862-PROV) and U.S. Serial No. 
09/734,675, filed December 13, 2000 (Atty. Docket CL000862). 

FIELD OF THE INVENTION 

The present invention is in the field of protease proteins that are related to the 
protease subfamily, recombinant DNA molecules, and protein production The present invention 
specifically provides novel peptides and proteins that effect protein cleavage/processing/tumover 
and nucleic acid molecules encoding such peptide and protein molecules, all of which are useful 
in the development of human therapeutics and diagnostic compositions and methods. 

1 5 BACKGROUND OF THE INVENTION 

The proteases may be categorized into families by the different amino acid sequences 
(generally between 2 and 1 0 residues) located on either side of the cleavage site of the protease. 

The proper functioning of the cell requires careful control of the levels of important 
structural proteins, enzymes, and regulatory proteins. One of the ways that cells can reduce the 
steady state level of a particular protein is by proteolytic degradation. Further, one of the ways 
cells produce functioning proteins is to produce pre or pro-protein precursors that are processed 
by proteolytic degradation to produce an active moiety. Thus, complex and highly-regulated 
mechanisms have been evolved to accomplish this degradation. 

Proteases regulate many different cell proliferation, differentiation, and signaling 
processes by regulating protein turnover and processing. Uncontrolled protease activity (either 
increased or decreased) has been implicated in a variety of disease conditions including 
infl a mmat ion, cancer, arteriosclerosis, and degenerative disorders. 

An additional role of intracellular proteolysis is in the stress-response. Cells that are 
subject to stress such as starvation, heat-shock, chemical insult or mutation respond by 
increasing the rates of proteolysis. One function of this enhanced proteolysis is to salvage amino 
acids from non-essential proteins. These amino acids can then be re-utilized in the synthesis of 
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essential proteins or metabolized directly to provide energy. Another function is in the repair of 
damage caused by the stress. For example, oxidative stress has been shown to damage a variety 
of proteins and cause them to be rapidly degraded. 

The International Union of Biochemistry and Molecular Biology (TUBMB) has 
recommended to use the term peptidase for the subset of peptide bond hydrolases ( Subclass E.C 
3.4.). The widely used term protease is synonymous with peptidase. Peptidases comprise two 
groups of enzymes: the endopeptidases and the exopeptidases, which cleave peptide bonds at 
points within the protein and remove amino acids sequentially from either N or C-terminus 
respectively. The term proteinase is also used as a synonym word for endopeptidase and four 
mechanistic classes of proteinases are recognized by the IUBMB: two of these are described 
below (also see: Handbook of Proteolytic Enzymes by Barrett, Rawlings, and Woessner AP 
Press, NY 1998). Also, for a review of the various uses of proteases as drug targets, see: Weber 
M, Emerging treatments for hypertension: potential role for vasopeptidase inhibition; Am J 
Hypertens 1999 Nov;12(l 1 Pt 2):139S-147S; Kentsch M, Otter W, Novel neurohormonal 
modulators in cardiovascular disorders. The therapeutic potential of endopeptidase inhibitors, 
Drugs RD 1999 Apr,l(4):331-8; Scarborough RM, Coagulation factor Xa: the prothrombinase 
complex as an emerging therapeutic target for small molecule inhibitors, J Enzym Inhib 
1998;14(l):15-25; Skotnicki JS, et al., Design and synthetic considerations of matrix 
metalloproteinase inhibitors, Ann N Y Acad Sci 1999 Jun 30;878:61-72; McKerrow JH, Engel 
JC, Caflfrey CR, Cysteine protease inhibitors as chemotherapy for parasitic infections, Bioorg 
Med Chem 1999 Apr;7(4):639-44; Rice KD, Tanaka RD, Kate BA, Numerof RP, Moore WR, 
Inhibitors of tryptase for the treatment of mast cell-mediated diseases, Curr Pharm Des 1998 
Oct;4(5):381-96; Materson BJ, Will angiotensin converting enzyme genotype, receptor mutation 
identification, and other miracles of molecular biology permit reduction ofNNT Am J Hypertens 
1998 Aug;ll(8 Pt 2):138S-142S 



Serine Proteases 

The serine proteases (SP) are a large family of proteolytic enzymes that include the 
digestive enzymes, trypsin and chymotrypsin, components of the complement cascade and of the 
blood-clotting cascade, and enzymes that control the degradation and turnover of 
macromolecules of the extracellular matrix. SP are so named because of the presence of a serine 
residue in the active catalytic site for protein cleavage. SP have a wide range of substrate 
specificities and can be subdivided into subfamilies on the basis of these specificities. The main 

2 




10 



15 



20 



25 



30 



WO 02/26947 „„„ 

PCT/USO 1/29960 

subfamilies are trypases (cleavage after arginine or lysine), aspases (cleavage after aspartate), 
chymases (cleavage after phenylalanine or leucine), metases (cleavage after methionine), and ' 
serases (cleavage after serine). 

A series of six SP have been identified in murine cytotoxic T-lymphocytes (CTL) and 
natural killer (NK) cells. These SP are involved with CTL and NK cells in the destruction of 
vitally transformed cells and tumor cells and in organ and tissue transplant rejection (Zunino, S. 
J. et al. (1990) J. Immunol. 144:2001-9; Sayers, T. J. et al. (1994) J. Immunol. 152:2289-97).' 
Human homologs of most of these enzymes have been identified (Trapani, J. A et al. (1988) 
Proc. Natl. Acad. Sci. 85:6924-28; Caputo, A et al. (1990) J. Immunol. 145:737-44). Like all 
SP, the CTL-SP share three distinguishing features: 1) the presence of a catalytic triad of 
histidine, serine, and aspartate residues which comprise the active site; 2) the sequence GDSGGP 
which contains the active site serine; and 3) an N-terminal EGG sequence which characterizes 
the mature SP. 

The SP are secretory proteins which contain N-terminal signal peptides that serve to 
export the immature protein across the endoplasmic reticulum and are then cleaved (von Heijne 
(1986) Nuc. Acid. Res. 14:5683-90). Differences in these signal sequences provide one means of 
distinguishing individual SP. Some SP, particularly the digestive enzymes, exist as inactive 
precursors or preproenzymes, and contain a leader or activation peptide sequence 3' of the signal 
peptide. This activation peptide may be 2-12 amino acids in length, and it extends from the 
cleavage site of the signal peptide to the N-terminal HGG sequence of the active, mature protein. 
Cleavage of this sequence activates the enzyme. This sequence varies in different SP according 
to the biochemical pathway and/or its substrate (Zunino et al, supra; Sayers et al, supra). Other 
features that distinguish various SP are the presence or absence of N-linked glycosylate sites 
mat provide membrane anchors, the number and distribution of cysteine residues that determine 
the secondary structure of the SP, and the sequence of a substrate binding sites such as S\ The S' 
substrate binding region is denned by residues extending from approximately +1 7 to +29 relative 
to the N-terminal I (+1). Differences in this region of the molecule are believed to determine SP 
substrate specificities (Zunino et al, supra). 

Trypsin-like serine proteases have been isolated from patients with chronic airway 
diseases and may play a role in respiratory diseases and host defense systems on the mucous 
membranes of the respiratory system (see Yamaoka et al., J. Biol. Chem. 273: 1 1895-1 1901, 
1998 and Yasuoka et al, Am. J. Resp. CellMolec. BioL 16: 300-308, 1997). Therefore, novel 
human serine protease proteins, and encoding genes, may be useful for screening for, diagnosing, 
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preventing, and/or treating disorders such as respiratory diseases. For example, serine protease 
genes/proteins may be useful in drug development, such as by serving as novel drug targets for 
respiratory disease, and SNPs in serine protease genes may be useful markers for diagnostic kits 
for respiratory diseases. 



Trypsinoeens 

The trypsinogens are serine proteases secreted by exocrine cells of the pancreas (Travis J 
and Roberts R_ Biochemistry 1969; 8: 2884-9; MaUory P and Travis J, Biochemistry 1973; 12: 
2847-51). Two major types of tripsinogen isoenzymes have been characterized, trypsinogen-1, 
also called cationic trypsinogen, and trypsinogen-2 or anionic trypsinogeiL The trypsinogen 
proenzymes are activated to trypsins in the intestine by enterokinase, which removes an 
activation peptide from the N-terminus of the trypsinogens. The trypsinogens show a high degree 
of sequence homology, but they can be separated on the basis of charge differences by using 
electrophoresis or ion exchange chromatography. The major form of trypsinogen in the pancreas 
and pancreatic juice is trypsinogen-1 (Guy CO et aL, Biochem Biophys Res Commun 1984; 125: 
516-23). In serum of healthy subjects, trypsinogen-1 is also the major form, whereas in patients 
with pancreatitis, trypsinogen-2 is more strongly elevated (Itkonen et aL, J Lab Clin Med 1990; 
1 15:712-8). Trypsinogens also occur in certain ovarian tumors, in which trypsinogen-2 is the 
major form (Koivunen et aL, Cancer Res 1990; 50: 2375-8). Trypsin- 1 in complex with alpha-1- 
antitrypsin, also called alpha-l-antiprotease, has been found to occur in serum of patients with 
pancreatitis (Borgstrom A and Ohlsson K, Scand J Clin Lab Invest 1984; 44: 381-6) but 
determination of this complex has not been found useful for differentiation between pancreatic 
and other gastrointestinal diseases (Borgstrom et aL, Scand J Clin Lab Invest 1989; 49:757-62). 

Trypsinogen-1 and -2 are closely related immunologically (Kimland et aL, Clin Chim 
Acta 1989; 184: 31-46; Itkonen et aL, 1990), but by using monoclonal antibodies (Itkonen et aL, 
1990) or by absorbing polyclonal antisera (Kimland et aL, 1989) it is possible to obtain reagents 
enabling specific measurement of each form of trypsinogen. 

When active trypsin reaches the blood stream, it is inactivated by the major trypsin 
inhibitors alpha-2-macroglobulin and alpha-1 -antitrypsin (AAT). AAT is a 58 kilodalton serine 
protease inhibitor synthesized in the liver and is one of the main protease inhibitors in blood. 
Whereas complexes between trypsin-1 and AAT are detectable in serum (Borgstrom and 
Ohlsson, 1984) the complexes with alpha -2-macroglobulin are not measurable with antibody- 
based assays (Ohlsson K, Acta Gastroenterol Belg 1988; 51: 3-12). 
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Inflamma t i on of the pancreas or pancreatitis may be classified as either acute or chronic 
by clinical criteria. With treatment, acute pancreatitis can often be cured and normal function 
restored. Chronic pancreatitis often results in permanent damage. The precise mechanisms which 
trigger acute infl a mm ation are not understood. However, some causes in the order of their 
importance are alcohol ingestion, biliary tract disease, post-operative trauma, and hereditary 
pancreatitis. One theory provides that autodigestion, the premature activation of proteolytic 
enzymes in the pancreas rather than in the duodenum, causes acute pancreatitis. Any number of 
other fectors including endotoxins, exotoxins, viral infections, ischemia, anoxia, and direct 
trauma may activate the proenzymes. In addition, any internal or external blockage of pancreatic 
ducts can also cause an accumulation of pancreatic juices in the pancreas resulting cellular 
damage. 

Anatomy, physiology, and diseases of the pancreas are reviewed, inter alia, in Guyton 
AC (1991) Textbook of Medical Physiology, W B Saunders Co, Philadelphia Pa.; Isselbacher K 
J et al (1994) Harrison's Principles of Internal Medicine, McGraw-Hill, New York City; Johnson 
K E (1991) Histology and Cell Biology, Harwal Publishing, Media Pa.; and The Merck Manual 
of Diagnosis and Therapy (1992) Merck Research Laboratories, Rahway N.J. 



Metalloprotease 

The metalloproteases may be one of the older classes of proteinases and are found in 
20 bacteria, fungi as well as in higher organisms. They differ widely in their sequences and their 
structures but the great majority of enzymes contain a zinc atom which is catalyucally active. In 
some cases, zinc may be replaced by another metal such as cobalt or nickel without loss of the 
activity. Bacterial thermolysin has been well characterized and its crystallo graphic structure 
indicates that zinc is bound by two histidines and one glutamic acid. Many enzymes contain the 
25 sequence HEXXH, which provides two histidine ligands for the zinc whereas the third ligand is 
either a glutamic acid (thermolysin, neprilysin, alanyl aminopeptidase) or a histidine (astacin). 
Other families exhibit a distinct mode of binding of the Zn atom. The catalytic mechanism leads 
to the formation of a non covalent tetrahedral intermediate after the attack of a zinc-bound water 
molecule on the carbonyl group of the scissile bond. This intermediate is further decomposed by 
30 transfer of the glutamic acid proton to the leaving group. 

Metalloproteases contain a catalytic zinc metal center which participates in the hydrolysis 
' of the peptide backbone (reviewed in Power and Harper, in Protease Inhibitors, A. J. Barrett and 
. G. Salversen (eds.) Elsevier, Amsterdam, 1986, p. 219). The active zinc center differentiates 
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some of these proteases from calpains and trypsins whose activities are dependent upon the 
presence of calcium. Examples of metalloproteases include carboxypeptidase A, 
carboxypeptidase B, and thermolysis 

Metalloproteases have been isolated from a number of procaryotic and eucaryotic 
5 sources, e.g. Bacillus subtilis (McConn et al., 1964, J. Biol. Chem. 239:3706); Bacillus 

megaterium; Serratia (Miyata et al., 1971, Agr. Biol. Chem. 35:460); Clostridium bifennentans 
(MacFarlane et al., 1992, App. Environ. Microbiol. 58:1195-1200), Legionella pneumophila 
(Moffat et al., 1994, Infection and Immunity 62:751-3). In particular, acidic metalloproteases 
have been isolated from broad-banded copperhead venoms (Johnson and Ownby, 1993, Int J. 
10 Biochem. 25:267-278), rattlesnake venoms (Chlou et al., 1992, Biochem. Biophys. Res. 

Commun. 187:389-396) and articular cartilage (Treadwell et al., 1986, Arch. Biochem. Biophys. 
25 1 :715-723). Neutral metalloproteases, specifically those having optimal activity at neutral pH 
have, for example, been isolated from Aspergillus sojae (Sekine, 1973, Agric. Biol. Chem. 
37:1945-1952). Neutral metalloproteases obtained from Aspergillus have been classified into 
two groups, npl and npll (Sekine, 1972, Agric. Biol. Chem. 36:207-216). So fer, success in 
obtaining amino acid sequence information from these fungal neutral metalloproteases has been 
limited. An npll metalloprotease isolated from Aspergillus oryzae has been cloned based on 
amino acid sequence presented in the literature (Tatsumi et al., 1991, Mol. Gen. Genet 228:97- 
103). However, to date, no npl fungal metalloprotease has been cloned or sequenced. Alkaline 
metalloproteases, for example, have been isolated from Pseudomonas aeruginosa (Baumann et 
al., 1993, EMBO J 12:3357-3364) and the insect pathogen Xenoifcabdus luminescens (Schmidt 
et al., 1998, Appl. Environ. Microbiol. 54:2793-2797). 

Metalloproteases have been devided into several distinct families based primarily on 
activity and sturcture: 1) water nucleophile; water bound by single zinc ion ligated to two His 
(within the motif HEXXH) and Glu, His or Asp; 2) water nucleophile; water bound by single 
zinc ion ligated to His, Glu (within the motif HXXE) and His; 3) water nucleophile; water bound 
by single zinc ion ligated to His, Asp and His; 4) Water nucleophile; water bound by single zinc 
ion ligated to two His (within the motif HXXEH) and Glu and 5) water nucleophile; water bound 
by two zinc ions ligated by Lys, Asp, Asp, Asp, Glu. 

Examples of members of the metalloproteinase family include, but are not limited to, 
membrane alanyl aminopeptidase (Homo sapiens), germinal peptidyl-dipeptidase A (Homo 
sapiens), thimet oligopeptidase (Rattus norvegjcus), oligopeptidase F (Lactococcus lactis), 
mycolysin (Streptomyces cacaoi), immune inhibitor A (Bacillus thuringiensis), snapalysin 
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(Streptomyces lividans), leishmanolysin (Leishmania major), microbial collagenase (Vibrio 
alginolyticus), microbial coUagenase, class I (Clostridium perfringens), collagenase 1 (Homo 
sapiens), serralysin (Serratia marcescens), fragilysin (Bacteroides firagilis), gametolysin 
(Chlamydomonas reinhardtii), astacin (Astacus fluviatilis), adamalysin (Cro talus adamanteus), 
5 ADAM 1 0 (Bos taurus), neprilysin (Homo sapiens), carboxypeptidase A (Homo sapiens), 
carboxypeptidase E (Bos taurus), gamma-D-glutamyl-(L)-meso^aminopimelate peptidase I 
(Bacillus sphaericus), vanY D-Ala-D-Ala carboxypeptidase (Enterococcus faecium), endolysin 
(bacteriophage All 8), pitrilysin (Escherichia coh), mitochondrial processing peptidase 
(Saccharomyces cerevisiae), leucyl aminopeptidase (Bos taurus), aminopeptidase I 
10 (Saccharomyces cerevisiae), membrane dipeptidase (Homo sapiens), glutamate carboxypeptidase 
(Pseudomonas sp.), Gly-X carboxypeptidase (Saccharomyces cerevisiae), O-sialoglycoprotein 
endopeptidase (Pasteurella haemolytica), beta-lyric metalloendopeptidase (Achromobacter 
lyticus), methionyl aminopeptidase I (Escherichia coh), X-Pro aminopeptidase (Escherichia 
coli), X-His dipeptidase (Escherichia coli), IgAl -specific metalloendopeptidase (Streptococcus 
15 sanguis), tentoxilysin (Clostridium tetani), leucyl aminopeptidase (Vibrio proteolyticus), 

aminopeptidase (Streptomyces griseus), IAP aminopeptidase (Escherichia cob"), aminopeptidase 
T (Thermus aquaticus), hyicolysin (Staphylococcus hyicus), carboxypeptidase Taq (Thermus 
aquaticus), anthrax lethal factor (Bacillus anthracis), peniculolysin (Penicilhum citrinum), 
nmgalysin (Aspergillus fumigams), lysostaphin (Staphylococcus simulans), beta-aspartyl 
20 dipeptidase (Escherichia coh), carboxypeptidase Ss 1 (Sulfolobus solfataricus), FtsH 

endopeptidase (Escherichia coh), glutamyl aminopeptidase (Lactococcus lactis), cytophagalysin 
(Cytophaga sp.), metalloendopeptidase (vaccinia virus), VanX D-Ala-D-Ala dipeptidase 
(Enterococcus faecium), Ste24p endopeptidase (Saccharomyces cerevisiae), dipeptidyl-peptidase 
m (Rattus norvegicus), S2P protease (Homo sapiens), sporulation factor SpoIVFB (Bacillus 
25 subtilis), and HYBD endopeptidase (Escherichia coh). 

Metalloproteases have been found to have a number of uses. For example, there is strong 
evidence mat a metalloprotease is involved in the in vivo proteolytic processing of the 
vasoconstrictor, endothelin-1. Rat metalloprotease has been found to be involved in peptide 
hormone processing One important subfamily of the metalloproteases are the matrix 
30 metalloproteases. 

A number of diseases are thought to be mediated by excess or undesired metalloprotease 
activity or by an imbalance in the ratio of the various members of the protease femily of proteins. 
These include: a) osteoarthritis (Woessner, et al., J. BioLChem. 259(6), 3633, 1984; Phadke, et 
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al., J. Rheumatol. 10, 852, 1983), b) rheumatoid arthritis (Mullins et al Ri v • 

695 117 losa.w , <" ™' et **" Bl ochim. Biophys. Acta 

695, 1 17, 1983, Woolly, « a,. Arthritis Rhe^O, 1231, ,977; GntvaUese a * J^L 

34, .07,, ,991), c> septic (W^ „ ^ ^ ^ ^^^ 

« <«-* « a,., Cancer Res. 48, 3307, 1988, eat. MatiWen, « „ ^ ^ * 
Acad Sci., USA 83 94n io»/c\ \ ■ * * ^roc. iNat I. 

«. 1987), 4 «d ulce^tion (Bums, et al,, Invest Optotao,. vis . Sci . 30 , 5W j,"** 
V woessner, et al., Steroids 54, 491 1989'. lc\ rii**^ u u- 

al t t ™* r\ } dystr °P hoblc epidermolysis bullosa (Kronbereer et 

al., J. Invest Dermatol 79 2ns i o»o\ „ jnr i^wnoerger, et 

in, ♦ • • x X 1} d * BBtiw loss foUowiug traumatic 
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» 1 01X ^ p^» n (E^chU coH), polypi p , adase wmmmm ^ 

• -**->fm pepudase (Aacaris tardea), P„ 

putaPve proteinase of Skippy iaMansposoa ** . ° >' 

Ol^^i^ oxysporum), tetravirus endopeptidase 

(Nudaureha capensis omega virus), presenilin 1 (Homo sapiens). 

Protease s and Canr^r 

Proteases axe critical elects at several stages m the progress of metastatlc n ^ 
®^^Msion of a tumor in the primary site, evasion from this site as we,, as homing and in^ in 

dependent on proteolytic tissue remodeling. Transfection ^ . 

e- iransiection expenments with vanous types of 
proteases have shown that the matrix metaUoproteases nl™ . • * , - 

, , . eLauoproteases play a dommant role in these processes in 

particular gelaunases A and B (MMP-2 and MMP-9 resDectivH ^ v . 
seeVf,,,,,- , , t . ' "^P^vely). For an overview of this field 

see Mulhns, et al., Biochim. Bionhvs Acta fiQi m ,no, n 

1994- tt , ' *' 64 Eur - J - 7 > 2062, 

1994, Birkedal-Hansen, et al., Crit Rev. Oral BioL Med. 4, 197, 1993. 

Furthermore, it was demonstrated that inhiK;*;^ n 
f , . mHBa 1,131 ™™cm of degradation of extraceUular matrix bv 

the native matrix metalloprotease inhibitor TIMP-2 fa ™«t« . ^ y 
. . _ „ nor (aprotem) arrests cancer growth (DeClerck- 

etal.,CancerRes. 52, 701, 1992) and that TIMP-2 inhibit. * , , ^ 

rrjvu- 2 inhibits tumor-mduced angiogenesis in 
experimental systems (Moses, et al. Science 248 1408 loom v 

. lr „ ' 1W5 » 1990). For a review, see DeClerck. et al 

Ann. N. Y. Acad. Sci. 732, 222, 1994 It was fbrtw ^ Ct aL ' 

, it was further demonstrated that the synthetic matrix 
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metalloprotease inhibitor batimastat when given intraperitoneally inhibits huznan colon tumor 
growth and spread in an orthotopic model in nude mice (Wang, et al. Cancer Res 54 4726 
1994) andprolongs the survival of mice bearinghuman ovarian carcinoma xenograft! (Davies 
et al., Cancer Res. 53, 2087, 1993). The use of this and related compounds has been described 'in 
Brown, et al., WO-9321942 A2. m 

treataent of otter diaeaaes as ooted .w ( e . g . ^ „ ^ wo ., 519965 A , 
WO-95,9956 A>; Beckett, et al., WO-9519957 Al; Beckea, « t WO-PS.^^^ e, 
al., MM A* Qta*. - wo . 9421625 Al; Dickens, et at. US plto 
4,599361; Hoghes, et al„ US. Pat No. 5,190,937; Btoadhtra, et aL, EP S747S8 Al- 
Broadhutst, et aL, EP 276436; and Myere, et al., EP 520573 Al. 
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m « , ^. . ibers of this subfamily of protease proteins. The 

present xnvenhon advances the state of the art by providing a piously unidentified human 
protease proteins that have homology to members of the serine subfamily. 

SUMMARY OF THE INVENTION 

The present invention is based in part on the identification of amino acid sequences of 
humanproteasepepndesandprotein^ 

allehc vanants and omer mammalian orthologs thereof. These unique peptide sequences and 
nuclei acad sequences that encode these peptides, can be used as model, for the development of 
human merapeutic targets, aid in the identification of therapeutic proteins, and serve as targets 
for the development of human therapeutic agents that modulate protease activity in cells and 
^ssues mat express theprotease. Experimental data as provided in Figure 1 indicates expression 

humans m testis, placenta, fetal lung, fetal fcdney, fetal heart, fetal brain, bone marrow, and in 
cancers. 
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DESCRIPTION OF THE FIGURE SHEETS 

FIGURE 1 provides the nucleotide sequence of a cDNA molecule that encodes the 
^teaseproteinofthepresentinvention. (SEQlDNO:l) In addition, structure and functional 
nrformatmn is provided, such as ATG start, stop and tissue distribution, where available that 
allows one to readily determine specific uses of inventions based on this molecular sequence 

^rlTr P 7 dediDFigUre 1 ^^^^-intesti.placenta,' 
fetal lung, fetal fadney, fetal heart, fetal brain, bone marrow, and in cancers 

FIGURE 2 provides the predicted amino acid sequence of the protease of the present 
»venho. (SEQIDNOS) ^o^ c ^ d ^ m ^ qq ^ ^ 

determine specific uses of inventions based on this molecular sequence 

of the 3 PrDVideS 8aK,miC *~ ~"«°* Please protein 

of thepresentmventio. (SEQIDNOS) In addition struck and fUnctiona, information such 

PW3 S^ 116 ° f inVCTtiODS b3Sed ° D &is mol «^ar sequence. As indicated in 

^T 3 ; ^ 7^ ™^tion polymorphisms ("mdels^ were identified at 69 

present invention. 
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DETAILED DESCRIPTION OF THE INVENTION 

General DraTr^ 

The present invention is based on the «!ftni. m r,'r,« ~*+u i_ 

° n me se< 3 u encing of the human genome. During the 

-~ed fra^ „ „ ^ ^ ^ ^ ^ 
^ acid (aIMic ^ of mi J^ mt " 
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information about the closest art known protein/peptide/domain that has structural or sequence 
homology to the protease of the present invention. 

In addition to being previously unknown, the peptides that are provided in the present 
invention are selected based on their ability to be used for the development of commercially 
5 important products and services. Specifically, the present peptides are selected based on 
homology and/or structural relatedness to known protease proteins of the serine protease 
subfamily and the expression pattern observed. Experimental data as provided in Figure 1 
indicates expression in humans in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain 
bone marrow, and in cancers. The art has clearly established the commercial importance of 
10 members of this family of proteins and proteins that have expression patterns similar to that of 
the present gene. Some of the more specific features of the peptides of the present invention, and 
the uses thereof, are described herein, particularly in the Background of the Invention and in the 
annotation provided in the Figures, and/or are known within the art for each of the known serine 
family or subfamily of protease proteins. 
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Tne present invention provides nucleic acid sequences that encode protein molecules that 
have been identified as being members of the protease family of proteins and are related to the 
serine protease subfamily (protein sequences are provided in Figure 2, transcript/cDNA 
sequences are provided in Figure 1 and genomic sequences are provided in Figure 3) The 
peptide sequences provided in Figure 2, as well as the obvious variants described herein, 
particularly allelic variants as identified herein and using the infonnation in Figure 3 will be 
referred herein as the protease peptides of the present invention, protease peptides, or 
pepudes/proteins of the present invention. 

The present invention provides isolated peptide and protein molecules that consist of 
consist essentially of or comprise the amino acid sequences of the protease peptides disclosed in 
the Figure 2, (encoded by the nucleic acid molecule shown in Figure 1, transcript/cDNA or 
Figure 3, genomic sequence), as well as all obvious variants of these peptides that axe within the 
art to make and use. Some of these variants are described in detail below. 

As used herein, a peptide is said to be "isolated" or "purified" when it is substantially free 
of cellular material or free of chemical precursors or other chemicals. The peptides of the present 
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invention can be purified to homogeneity or other degrees of purity. The level of purification will 
be based on the intended use. The critical feature is that the preparation allows for the desired 
function of the peptide, even if in the presence of considerable amounts of other components (the 
features of an isolated nucleic acid molecule is discussed below). 
! Ileuses, "substantially free of cellular material" includes preparations of the peptide 

having less than about 30% (by dry weight) other proteins (Le., contaminating protein), less than 
about 20% other proteins, less than about 1 0% other proteins, or less than about 5% other proteins 
When the peptide is recombinantly produced, it can also be substantially free of culture medium, 
..e., culture medium represents less than about 20% of the volume of the protein preparation. 

The language "substantially free of chemical precursors or other chemicals" includes 
preparations of the peptide in which it is separated from chemical precursors or other chemicals that 
are mvolved in its synthesis. In one embodiment, the language "substantially free of chemical 
precursors or other chemicals" includes preparations of the protease peptide havurg less than about 
30% (by dry weight) chemical precursors or other chemicals, less than about 20% chemical 
, precursors or other chemicals, less than about 10% chemical precursors or other chemicals, or less 
than about 5% chemical precursors or other chemicals. 

The isolated protease peptide can be purified from cells that naturally express it, purified 
from cells that have been altered to express it (recombinant), or synthesized using known protein 
synthesis methods. Experimental data as provided in Figure 1 indicates expression in humans in 
testrs, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, bone marrow, and in cancers For 
example, a nucleic acid molecule encoding the protease peptide is cloned into an expression vector 
the expression vector introduced into a host cell and the protein expressed in the host celL The 
protein can men be isolated from the cells by an appropriate purification scheme using standard 
protem purification techniques. Many of these techniques are described in detail below. 

Accordingly, the present invention provides proteins that consist of the amino acid 
sequences provided in Figure 2 (SEQ ID NO:2), for example, proteins encoded by the 
transcript/cDNA nucleic acid sequences shown in Figure 1 (SEQ ID NO: 1 ) and the genomic 
sequences provided in Figure 3 (SEQ ID NO:3). The amino acid sequence of such a protein is 
provided in Figure 2. A protein consists of an amino acid sequence when the amino acid sequence 
is the final amino acid sequence of the protein. 

The present invention further provides proteins that consist essentially of the amino acid 
sequences provided in Figure 2 (SEQ ID NO:2), for example, proteins encoded by the 
transcnpt/cDNAnucleic acid sequences shown inFigure 1 (SEQ ID NO: 1) and the genomic 



13 





10 



15 



20 



25 



30 



WO 02126947 

PCT/US01/29960 

sequences provided in Figure 3 (SEQ ID NO:3). A protein consists essentially of an amino acid 
sequence when such an amino acid sequence is present with only a few additional amino acid 
residues, for example from about 1 to about 100 or so additional residues, typically from 1 to about 
20 additional residues in the final protein. 

The present invention further provides proteins that comprise the amino acid sequences 

a«d sequences shown in Figure 1 (SEQ ID NO:l) and the genomic sequences provided in Figure 3 
(SEQ ID NO:3). A protein comprises an amino acid sequence when the amino acid sequence is at 
least part of the final amino acid sequence of the protein In such a fashion, the protein can be only 
the peptide or have additional amino acid molecules, such as amino acid residues (contiguous 
encoded sequence) that are naturally associated with it or heterologous amino acid resadue.peptide 
sequences. Such a protein can have a few additional amino acid residues or can comprise several 
hundred or more additional amino acids. The preferred classes of proteins that are comprised of the 
protease peptides of the present invention are the naturally occurring mature proteins. A brief 
description of how various types of these proteins can be madeteolated is provided below 

The protease peptides of the present invention can be attached to heterologous sequences to 
formchn^ericorfusionproteins. Such chimeric and fuszon proteins comprise a protease peptide 
operative* hnked to a heterologous protein having an amino acid sequence not substantially 
homologous to the protease peptide. "Operatively linked" indicates that the protease peptide and the 
heterologous protem are fused m-fianae. The heterologous protein can be fused to the N-terminus 
or C-tenninus of the protease peptide. 

fa some uses, the fusion protein does not affect the activity of the protease peptide^ 
For example, the fusion protein can include, but is not limited to, enzymatic fusion proteins, for 
example beta-galactosidase fusions, yeast two-hybrid GAL fusions, poly-His fusions, MYC-tagged, 
HI-tagged and Ig fusions. Such fusion proteins, particularly poly-His fusions, can facilitate the 
purification of recombinant protease peptide, fa certain host cells (e.g., mammahan host cells) 
expression and/or secretion of a protein can be increased by using a heterologous signal sequence 
A chimeric or fusion protein can be produced by standard recombinant DNA techniques 
mple, DNA fragments coding for the different protein sequences are ligated together in- 
frame m accordance with conventional techniques. In another embodiment, the fusion gene can be 
synthesized by conventional techniques including automated DNA synthesizers. Alternatively PGR 
amplification of gene fragments can be carried out using anchor primers which give rise to 
complementary overhangs between two consecutive gene fragments which can subsequently be 
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annealed and re-amplified to generate a chimeric gene sequence (see Ausubel et a!., Current 
Protocols in Molecular Biology, 1992). Moreover, many expression vectors are commercially 
availablethataJreadyen^^ A protease peptide-cocoding 

nucleic acid can be cloned into such an expression vector such that the fusion moiety is linked in- 
> frame to the protease peptide. 

As mentioned above, the present invention also provides and enables obvious variants of the 
ammo acid sequence of me proteins of the present invention, such as naturally occurring mature 
forms of thepeptide, allelic/sequence variants of the peptides, non-naturally occurring 
recombinant* derived variants of the peptides, and orthologs and paralogs of the peptides Such 
vanants can readily be generated using art-known techniques in the fields of recombinant nucleic 
acid technology and protein biochemistry. It is understood, however, that vanants exclude any 
ammo acid sequences disclosed prior to the invention. 

Such variants can readily be identified/made using molecular techniques and the sequence 
information disclosed herein. Further, such variants can readily be distinguished from other 
peptides based on sequence and/or structural homology to the protease peptides of the present 
invention. The degree of homology/identity present will be based primarily on whether the peptide 
xs a functional variant or non-functional variant, the amount of divergence present in the paralog 
femily and the evolutionary distance between the orthologs. 

To determine the percent identity of two amino acid sequences or two nucleic acid 
sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be 
introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal 
alignment and non-homologous sequences can be disregarded for comparison purposes) In a 
preferred embodiment, at least 30'/., 40%, 50%, 60%, 70%, 80%, or 90% or more of the length 
of a reference sequence is aligned for comparison purposes. The amino acid residues or 
nucleotides at corresponding amino acid positions or nucleotide positions are then compared. 
When a position in the first sequence is occupied by the same amino acid residue or nucleotide 
as the corresponding position in the second sequence, then the molecules are identical at that 
position (as used herein amino acid or nucleic acid "identity" is equivalent to amino acid or 
nucleicacid "homology"). The percent identity between the two sequences is a function of the 
number of identical positions shared by the sequences, taking into account the number of gaps 
and the length of each gap, which need to be introduced for optimal alignment of the two 
sequences. 
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The comparison of sequences and determination of percent identity and similarity 
between two sequences can be accomplished using a mathematical algorithm. {Computational 
Molecular Biology, Lesk, AM., ed, Oxford University Press, New York, 1988; Biocomputing: 
Hormatics and Genome Projects, Smith, D.W., ed, Academic Press, New York, 1993; Computer 
Analysis of Sequence Data, Part 1, Griffin, AM, and Griffin, EG., eds., Humana Press New 
Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987- and 
Sequence Analysis Primer, Gribskov, M and Devereux, J., eds., M Stockton Press, New York! 
1991). h a preferred embodiment, the percent identity between two amino acid sequences is 
determmed using the Needleman and Wunsch (J. Mol. Biol. (48):44*453 (1970)) algorithm 
winch has been incorporated into the GAP program in the GCG software package (available at 
http://ww.gcg.com), using either a Blossom 62 matrix or a PAM250 matrix, and a gap weight 
of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3, 4, 5, or 6. m yet another preferred 
embodunent, the percent identity between two nucleotide sequences is determined using the 
GAP program in the GCG software package (Devereux, J., et al. Nucleic Acids Res. 12(1)1%! 
(1984)) (available at http://ww.gcg.com), using a NWSgapdna.CMP matrix and a gap weight of 
40, 50, 60, 70, or 80 and a length weight of 1, 2, 3, 4, 5, or 6. M another embodiment, the 
percent identity between two amino acid or nucleotide sequences is determined using the 
algonthm of E. Myers and W. Miller (CABIOS, 4:11-17 (1989)) which has been incorporated 
into the ALIGN program (version 2.0), using a PAM120 weight residue table, a gap length 
penalty of 12 and a gap penalty of 4. 

The nucleic acid and protein sequences of the present invention can further be used as a 
Very sequence" to perform a search against sequence databases to, for example, identify other 
*~uly members or related sequences. Such searches can be performed usxng the NBLAST and 
XBLAST programs (version 2.0) of AltschuL etol (J.MolBiol. 215:403-10(1990)) BLAST 
nucleotide searches can be performed with the NBLAST program, score = 1 00, wordlength = 12 
to obtam nucleotide sequences homologous to the nucleic acid molecules of the invention. 
BLAST protein searches can be performed with the XBLAST program, score = 50, wordlength = 
3 to obtam ammo acid sequences homologous to the proteins of the invention. To obtain gapped 
ahgnments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et 
al. (Nucleic Acids Res. 25(1 7):3389-3402 (1997)). When utilizing BLAST and gapped BLAST 
programs, the default parameters of the respective programs (e.g., XBLAST and NBLAST) can 
be used. 
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Full-length pre-processed forms, as well as mature processed forms, of proteins that 
comprise one of the peptides of foe present invention can readily be identified as having complete 
sequence identity to one of foe protease peptides of foe present invention as well as being encoded 
by foe same genetic locus as foe protease peptide provided herein. The gene provided by foe present 
mvenuon is located on a genome component that has been mapped to human chromosome 4 (as 
mfocated in Figure 3), which is supported by multiple lines of evidence, such as STS and BAG map 

Allelic variants of a protease peptide can readily be identified as being a human protein 
havmg a fogh degree (significant) of sequence homology/identity to at least a portion of foe protease 
pephde as well as being encoded by foe same genetic locus as foe protease peptide provided herein 
Genet* locus can readily be determined based on foe genomic information provided in Figure 3 
such as foe genomic sequence mapped to foe reference humanThe gene provided by foe present' 
mvenhon is located on a genome component that has been mapped to human chromosome 4 (as 
mfocated in Figure 3), which is supported by multiple lines of evidence, such as STS and BAG map 
data. As used herein, two proteins (or a region of foe proteins) have significant homology when 
foe ammo acid sequences are typically at least about 70-80%, 80-90%, and more typically at 
least about 90-95% or more homologous. A significantly homologous amino acid sequence 
according to foe present invention, will be encoded by a nucleic acid sequence that will hybridize 
to a protease peptide encoding nucleic acid molecule under stringent conditions as more felly 
20 described below. 

Figure 3 provides information on SNPs that have been identified in foe gene encoding foe 
protease protein of the present invention. SNPs, including indels (indicated by a "-") were 
identified at 69 different nucleotide positions. Non-synonymous cSNPs were identified at position 
30496. The changes in foe amino acid sequence caused by these SNPs is indicated in Figure 3 and 
canreadily bedetermined using foe universal genetic code and the protein sequence provided in 
Rgure 2 as a reference. SNPs outside foe QRF and in introns may affect control/regulatory 
elements. 

Paralogs of a protease peptide can readily be identified as having some degree of significant 
sequence homologyArdenoty to at least aportion of foe protease peptide, as being encoded by a gene 
from humans, and ashaving similar activity or function Two proteins will typically be considered 
paralogs when foe amino acid sequences are typically at least about 60% or greater, and more 
typically at least about 70% or greater homology through a given region or domain. Such 
pamlogs will be encoded by a nucleic acid sequence that will hybridize to a protease peptide 
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encoding nucleic acid molecule under moderate to stringent conditions as more fully described 
below. 

Orthologs of a protease peptide can readily be identified as having some degree of 
significant sequence homology/identity to at least a portion of the protease peptide as well as being 
encoded by a gene from another organism. Preferred orthologs will be isolated from mammals, 
preferably primates, for the development of human therapeutic targets and agents. Such orthologs 
will be encoded by a nucleic acid sequence that will hybridize to a protease peptide encoding 
nucleic acid molecule under moderate to stringent conditions, as more fully described below 
depending on the degree of relatedness of the two organisms yielding the proteins. The gene' 
provided by the present invention is located on a genome component that has been mapped to 
human chromosome 4 (as indicated in Figure 3), which is supported by multiple lines of 
evidence, such as STS and BAC map data. 

Figure 3 provides information on SNPs that have been identified in the gene encoding the 
protease protein of the present invention. SNPs, including indels (indicated by a «-") were 
identified at 69 different nucleotide positions. Non-synonymous cSNPs were identified at position 
30496. The changes in the amino acid sequence caused by these SNPs is indicated in Figure 3 and 
can readily be determined using the universal genetic code and the protein sequence provided in 
Figure 2 as a reference. SNPs outside the ORF and in introns may affect control/regulatory 
elements. 

Non-naturally occurring variants of the protease peptides of the present invention can 
readily be generated using recombinant techniques. Such variants include, but are not limited to 
deletions, additions and substitutions in the amino acid sequence of the protease peptide For 
example, one class of substitutions are conserved amino acid substitution. Such substitutions are 
those that substitute a given amino acid in a protease peptide by another amino acid of like 
characteristics. Typically seen as conservative substitutions are the replacements, one for another 
among the aliphatic amino acids Ala, Val, Ixu, and He; interchange of the hydroxyl residues Ser ' 
and Thr, exchange of the acidic residues Asp and Glu; substitution between the amide residues Asn 
and Gin; exchange of the basic residues Lys and Arg; and replacements among the aromatic 
residues Phe and Tyr. Guidance concerning which amino acid changes are likely to be 
phenotypically silent are found in Bowie et aL, Science 247:1306-1310 (1990). 

Variant protease peptides can be fully functional or can lack function in one or more 
activities, e.g. ability to bind substrate, ability to cleave substrate, ability to participate in a signaling 
pathway, etc. Fully functional variants typically contain only conservative variation or variation in 
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non^ritical residues or in non-critical regions. Figure 2 provides the result of protein analysis and 
can be used to identify critical domams/regions. Functional variants can also contain substitution of 
sunrlar ammo acids that result in no change or an insignificant change in function Alternatively 
such substitutions may positively or negatively affect function to some degree. 
5 Non-functional variants typically contain one or more non-conservative amino acid 

substitutions, deletions, insertions, inversions, or truncation or a substitution, insertion, inversion, or 

deletion in a critical residue or critical region. 

Amino acids that are essential for function can be identified by methods known in the art, 
such as site-directed mutagenesis or alanme-s^nning mutagenesis (Cunningham etaL, Science 
24*1081-1085 (1989)), particularly using the results provided in Figure 2. The latter procedure 
mtroduces single alanine mutations at every residue in the molecule. The resulting mutant 
molecules are then tested for biological activity such as protease ac^vrty or in assays such as an « 
^proliferative activity. Sites that are critic! for binding partner/substrate binding can also be 
determined by structural analysis such as crystallization, nuclear magnetic resonance or 
photoafhnity labeling (Smith etaL, J. MoL Biol 224:899-904 (1992); deVosetoL Science 
255:306-312(1992)). 

The present invention further provides fragments of the protease peptides, in addition to 
proteins and peptides that comprise and consist of such fragments, particularly those comprising the 
indues identified in Figure 2. The fragments to which the invention pertains, however, are not to 
be construed as encompassing fragments that may be disclosed publicly prior to the present 
invention. 

As used herein, a fragment comprises at least 8, 10, 12, 14, 16, or more contiguous amino 
acrdresiduesfromapmteasepeptide. Such fragments can be chosen based on the ability to retain 
one or more of the biological activities of die protease peptide or could be chosen for the ability to 
perform a function, e.g. bind a substrate or act as an irnrnunoger, Particularly important fragments 
are biologically active fragments, peptides that are, for example, about 8 or more amino acids in 
length. Such fragments will typically comprise a domain or motif of the protease peptide, e g. 
active site, a transmembrane domain or a substrate-binding domain. Further, possible fragments 
xuclude, but are not limited to, domain or motif containing fragments, soluble peptide fragments 
and fragments staining immunogenic structures. Predicted domains and functional sites are 
readuy identifiable by computer programs well known and readily available to those of skill in the 
art (e.g.,PROSrTE analysis). The results of one such analysis are provided in Figure 2 
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Polypeptides often contain amino acids other than the 20 amino acids commonly referred to 
as the 20 naturally occurring amino acids. Further, many amino acids, including the terminal amino 
acids, may be modified by natural processes, such as processing and other post-translational 
modifications, or by chemical modification techniques well known in the art Common 
modifications that occur naturally in protease peptides are described in basic texts, detailed 
monographs, and the research literature, and they are well known to those of skiD in the art (some of 
these features are identified in Figure 2). 

Known modifications include, but are not limited to, acetyiation, acyiation, ADP- 
ribosylation, anfidation, covalent attachment of flavin, covalent attachment of a heme moiety 
covalent attachment of anucleotide or nucleotide derivative, covalent attachment of a lipid or lipid 
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pyroglutamate, formylation, gamma carboxyiation, glycosylation, GPI anchor formation, 
hydroxylation, iodmation, methylation, myristoylation, oxidation, proteolytic processing 
phosphorylation, prenylation, racemization, selenoylation, sulfation, transfer-RNA mediated 
addition of amino acids to proteins such as argmylation, and ubiqiritination. 

Such modifications are well known to those of skill in the ait and have been described in 
great detailin the sdentific hterature. Several particularly common medications, glycosylation, 
bpid attachment, sulfation, gamma-carboxylation of glutamic acid residues, hydroxylation and 
ADP-ribosylation, for instance, are described in most basic texts, such as Proteins - Structure and 
Molecular Properties, 2nd Ed, TJ2. Creighton, W. H. Freeman and Company, New York (1993) 
Many detailed reviews are available on this subject, such as by Wold, F., Posttranslational Covalent 
Modification of Proteins, B.C. Johnson, Ed, Academic Press, New York 1-12 (1983); Seifter et aL 
(Met*. EnzymoL 182: 626*46 (1990)) and Rattan* a/. (Ann. N.Y.Acad Sci 663:48*2 (1992)) 
Acconingly, the protease peptides of the present invention also encompass derivatives or 
analogs in winch a substituted amino acid rescue is not one encoded by the genetic code, in winch 
a subsutuent group is included, in which the mature protease peptide is fused with another 
compound, such as a compound to increase the half-life of the protease peptide (for example, 
polyethylene glycol), or in which the additional amino acids are fused to the mature protease 
peptide, such as a leader or secretory sequence or a sequence for purification of the mature protease 
peptide or a pro-protein sequence. 
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Proteiii/Pep tirfft Iter*: 

The proteins of the present invention can be used in substantial and specific assays 
related to the functional information provided in the Figures; to raise antibodies or to elicit 
another immune response; as a reagent (including the labeled reagent) in assays designed to 
quantitatively determine levels of the protein (or its binding partner or ligand) in biological 
fhnds; ^ as markers for tissues in which the corresponding protein is preferentially expressed 
(erther constitutively or at a particular stage of tissue differentiation or development or in a 
drsease state). Where theprotem binds or potentially binds to another protein or ligand (such as 
for example, in a protease-effector protein interaction or protease-ligand interaction), the protein 
can be use* to identify the binding partnered so as to develop a system to identify inhibitors 
of the bmdmg interaction. Any or all of these uses are capable of being developed into reagent 
grade or kit format for commercialization as commercial products. 

Methods for performing the uses listed above are well known to those skilled in the art 
References disclosing such methods include "Molecular Cloning: A Laboratory Manual" 2de<L 
Cold Spnng Harbor Laboratory Press, Sambrook, J., E. F. Fritsch and T. Maniatis eds 1989 ' 
and "Methods in Enzymology: Guide to Molecular Cloning Techniques", Academic Press ' 
Berger, S. L. and A R. Kimmel eds., 1 987. 



The potential uses of the peptides of the present invention are based primarily on the 
source of the protein as well as the class/action of the protein. For example, proteases isolated 
from humans and their humanmammalian orthologs serve as targets for identifying agents for 
use m mammalian therapeutic applications, e.g. a human drug, particularly in modulating a 
brological or pathological response in a cell or tissue that expresses the protease. Experimental 
data as provided in Figure 1 indicates that protease proteins of the present invention are 
expressed m humans in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, bone 

T™' ^ * CanCere - SpeCifiCaIJy ' 3 ™' -rthem blot shows expression in cancers In 
action, PCR-based tissue screening panels indicate expression in testis, placenta, fetal lung, 
f^^f^^f^^^^^ Alargepercentageofpharmaceutical 
agents are being developed that modulate the activity of pr o tease pro teins, particularly members 
of the serme subfamily (see Background of the Invention). The structural and functional 
urfonnation provided in the Background and Figures provide specific and substantial uses for the 
molecules of the present invention, particularly in combination with the expression information 
provrded m Figure 1. Experimental data as provided in Figure 1 indicates expression in humans 
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in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, bone marrow, and in cancers. 
Such uses can readily be determined using the information provided herein, that which is known 
in the art, and routine experimentation. 

The proteins of the present invention (including variants and fragments that may have been 
! disclosed prior to the present invention) are useful for biological assays related to proteases that are 
related to members of the serine subfamily. Such assays involve any of the known protease 
functions or activities or properties useful for diagnosis and treatment of protease-related conditions 
that are specific for the subfamily of proteases that the one of the present invention belongs to, 
particularly in cells and tissues that express the protease. Experimental data as provided in Figure 1 
indicates that protease proteins of the present invention are expressed in humans in testis, placenta, 
fetal lung, fetal kidney, fetal heart, fetal brain, bone marrow, and in cancers. Specifically, a virtual 
northern blot shows expression in cancers. In addition, PCR-based tissue screening panels indicate 
expression in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, and bone marrow. 

The proteins of the present invention are also useful in drug screening assays, in cell-based 
or cell-free systems. Cell-based systems can be native, i.e., cells that normally express the protease 
as a biopsy or expanded in cell culture. Experimental data as provided in Figure 1 indicates 
expression in humans in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, bone 
marrow, and in cancers. In an alternate embodiment, cell-based assays involve recombinant host 
cells expressing the protease protein. 

The polypeptides can be used to identify compounds that modulate protease activity of the 
protein in its natural state or an altered form that causes a specific disease or pathology associated 
with the protease. Both the proteases of the present invention and appropriate variants and 
fragments can be used in high-throughput screens to assay candidate compounds for the ability to 
bind to the protease. These compounds can be further screened against a functional protease to 
determine the effect of the compound on the protease activity. Further, these compounds can be 
tested in animal or invertebrate systems to determine activity/effectiveness. Compounds can be 
identified that activate (agonist) or inactivate (antagonist) the protease to a desired degree. 

Further, foe proteins of the present invention can be used to screen a compound for the 
ability to stimulate or inhibit interaction between the protease protein and a molecule that normally 
interacts with foe protease protein, e.g. a substrate or a component of foe signal pathway that the 
protease protein normally interacts (for example, a protease). Such assays typically include the 
steps of combining foe protease protein with a candidate compound under conditions that allow the 
protease protein, or fragment, to interact with the target molecule, and to detect the formation of a 
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complex between the protein and the target or to detect the biochemical consequence of the 
interaction with the protease protein and the target, such as any of the associated effects of signal 
transduction such as protein cleavage, cAMP turnover, and adenylate cyclase activation, etc. 

Candidate compounds include, for example, 1) peptides such as soluble peptides, including 
> Ig-taued fusion peptides and members of random peptide libraries (see, e.g., Lam et al. Nature 
354:S2-S4 (1991); Houghten et at, Nature 354:84*6 (1 991)) and combinatorial chemistry^erived 
molecular libraries made of D- and/or L~ configuration amino acids; 2) phosphopeptides (e g 
. members of random and partially degenerate, directed phosphopeptide libraries, see, e.g., Songyang 
et al, Cell 72:767-778 (1993)); 3) antibodies (e. & , polyclonal, monoclonal, humanized, anti- 
Khotyprc, chimeric, and single chain antibodies as well as Fab, F(ab ^ Fab expression library 
fragments, and epitope-binding fragments of antibodies); and 4) small organic and inorganic 
molecules (e.g. molecules obtained from combinatorial and natural product libraries) 

One candidate compound is a soluble fragment of the receptor that competes for substrate 
bmdmg. Other candidate compounds include mutant proteases or appropriate fragments containing 
mutations that affect protease function and thus compete for substrate. Accordingly, a fragment that 
competes for substrate, for example with a higher affinity, or a fragment that binds substrate but 
does not allow release, is encompassed by the invention. 

The invention further includes other end point assays to identify compounds that modulate 
(stonlate or inm^pmtease activity. The assays typrcally involve an assay of events in the signal 
transductionpamway that mdicate protease activity. Thus, the cleavage of a substrate, 
mactrvation/activation of a protein, a change in the expression of genes that are up- or down- 
regulated m response to the protease protein dependent signal cascade can be assayed. 

Any of the biological or biochemical functions mediated by the protease can be used as an 
endpoint assay. Tbese include all of the biochemical or biochemical^iological events described 
herem, m the references cited herein, incorporated by reference for these endpoint assay targets, and 
other functions known to those of ordinary *ffl b fl* « w ^ „„ * ^ y ^ ^ 

mforrnationprovidedmmeFigures, partcularly Figure 2. Specfficahy, a biological function of a 
cell or tissues that expresses the protease can be assayed, Experimental data as provided in Figure 1 

fetal lung, fetal kidney, fetal heart, fetal brain, bone marrow, and in cancers. SpecificaHy, a virtual 
normem blot slrows expression in cancers. In addition, PCR-based tissue screening panels indicate 
expression in testis, placenta, fetal hmg, fetal kidney, fetal heart, fetal brain, and bone marrow 
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Binding and/or activating compounds can also be screened by using chimeric protease 
proteins in which the amino tenninal extracellular domain, or parts thereof; the entire 
tansmembrane domain or subregions, such as any of the seven transmembrane segments or any of 
the mtraceUular or extracellular loops and the carboxy tenninal intracellular domain, or parts 
thereof, can be replaced by heterologous domains or subregions. For example, a substrate-binding 
region can be used that interacts with a different substrate then that which is recognized by the 
native protease. Accordingly, a different set of signal transduction components is available as an 
end-pomt assay for activation. Tins allows for assays to be performed in other than the specific host 
cell from which the protease is derived. 

Hie proteins of the present invention are also useful in competition binding assays in 
methods designed to discover compounds that interact with the protease (e.g. binding partners 
and/or hgands). Thus, a compounds exposed to a protease polypeptide under conditions that allow 
the compound to bind or to otherwise interact with the polypeptide. Soluble protease polypeptide is 
alsoaddedtomemixture. If the test compound interacts with the soluble protease polypeptide it 
decreases the amount of complex formed or activity from the protease target This type of assay is 
particularly useful in cases in which compounds are sought that interact with specific regions of me 
protease. Thus, the soluble polypeptide that competes with the target protease region is designed to 
contain peptide sequences corresponding to the region of interest 

To perform cell free drug screening assays, it is sometimes desirable to immobilize either 
the protease protein, or fragment, or its target molecule to facilitate separation of complexes from 

uncomplexed forms of one or both of the proteins, as well as to accommodate automation of the 
assay. 



Techniques for iinmobilizing proteins on matrices can be used in the drug screening assays 
In one er* 1 — 31 * ~ ' 









Ml 







25 



30 



t, a fusion protein can be provided which adds a domain that allows the protein to 
be bound to a matrix. For example, glutathiones-transferase fusion proteins can be adsorbed onto 
glutathione sepharose beads (Sigma Chemical, St Louis, MO) or glutathione derivatized microtitre 
plates, which are then combined with the cell lysates (e.g, ^S-labeled) and the candidate 
compound, and the mixture incubated under conditions conducive to complex formation (e g at 
Phyaological conditions for salt and pH). Following incubation, the beads are washed to remove 
any unbound label, and the matrix immobilized and radiolabel determined directly, or in the 
supernatant after the complexes are dissociated. Alternatively, the complexes can be dissociated 
from the matrix, separated by SDS-PAGE, and the level ofprotease-bmding protein found in the 
bead fraction quantitated from the gel using standard electrophoretic techniques. For example, 
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h . MMMi or its tinge, molecute can be immobilized ^ 
stieptevidio using tectaioues weU ^ taflleart . Alternatively, antibodies reactive with the 
proteni but which do not interfere wjfc binding „f to pretein te its targe, molecule can be 
diarized to tee wens of the plate, and tea protein hupped in fc ^ by ^ 

protem-prosenting wetis am, tea amount of complex happed „ tea well can be quantiteted 
Metecrfs for detecting such complex.., in addition to those described above for tee GST- 
nnmobdized enmptexes, include immunodetection of complexes nring antihoriiea reactive wite tee 
^^^-^or^rich^r^^^^^ 

associated with the tinge, molecule. 

Agcntefi^mcriulateoneofteepmt^eeofthepre^ 
T™^*~-<**-~''<~»-*~ "^-llyprefarf.tete^,^ 

.5 n^"v^ '"T* - -*- — — Such 
15 ^^^wdlknc^mnte^cmtre^bem^^i.,^^^ ^ 

^^«P^.»«rfn^i d ^ a< ^ toW 

• n^r^r^^ 3 ^"^ 6 ^^^-^^ 

tissues tea. ^"tepmte^Expcrimeutrf date ^prerfded in Figure , indict e^cn in 
^^P^^umg.^^.^^^^^ 

^.^^"^"otcludeteeatepaof^nninisteringamc^nfp,^ 

• pharomcertrorf cmnporftic,, to . snbjcclmnecd of suchtreatmod, teemcrfubdorbrfng 
identified as described herein, ^ 

in a tw tr r"" ° f *" - «» — » "bailprotetos- 

W^oTrZT T"* 0993) 8:I6 ' 3 - 1 ^ - Brent 

WO9^O30O), to tden.riVoteerprote.m, which hind to or interact wite the protege ted„ 

mvn^dmpmteaaeactivriy. Such protease-broding proteins areafco tikdy te beinvCvedin tee 
Proton of atgnrfs byteeprctea.p^rfna mp^ ^ ^ ^ 
a.cmenteof.p ro te M c. m<!dialedsigoalillgpathwa)r A,teroative,y, snch prote*^, 
pmtems are likely to be protease inhibitins. 

^^o^o^iabascdonntemodnWnanneofmorftinn^m,^ 

which consist of separable DNA-bindinc and acH m «. a 

oinmng and activation domains. Bnefly, the assay utilizes two 
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different DNA constructs. In one construct, the gene that codes for a protease protein is fused to 
a gene encoding the DNA binding domain of a known transcription factor (e.g., GAL-4). In the 
other construct, a DNA sequence, from a library of DNA sequences, that encodes an unidentified 
protein ("prey" or ••sample") is fused to a gene that codes for the activation domain of the known 
transcription factor. If the "bait" and the "prey" proteins are able to interact, in vivo, forming a 
protease-dependent complex, the DNA-binding and activation domains of the transcription factor 
are brought into close proximity. This proximity allows transcription of a reporter gene (e g 
LacZ) which is operably linked to a transcriptional regulatory site responsive to the transcription 
factor. Expression of the reporter gene can be detected and cell colonies containing the 
functional transcription factor can be isolated and used to obtain the cloned gene which encodes 
the protein which interacts with the protease protein. 

This invention further pertains to novel agents identified by the above-described 
screening assays. Accordingly, it is within the scope of this invention to further use an agent 
identified as described herein in an appropriate animal modeL For example, an agent identified 
as described herein (e.g., aprotease-modulating agent, an antisense protease nucleic acid 
molecule, a protease-specific antibody, or a protease-binding partner) can be used in an animal 
or other model to determine the efficacy, toxicity, or side effects of treatment with such an agent 
Alternatively, an agent identified as described herein can be used in an animal or other model to 
determine the mechanism of action of such an agent Furthermore, this invention pertains to uses 
of novel agents identified by the above-described screening assays for treatments as described 
herein. 

The protease proteins of the present invention are also useful to provide a target for 

diagnosing a disease or predisposition to disease mediated by the peptide. Accordingly the 

invention provides methods for detecting the presence, or levels of the protein (or encoding 

mRNA) ina cell, tissue, or organism. Experimental data as provided in Figure 1 indicates 

expression in humans in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, bone 

marrow, and in cancers. The method involves contacting a biological sample with a compound 

capable of interacting with the protease protein such that the interaction can be detected Such an 

assay can be provided in a single detection format or a multi-detection format such as an antibody 
chip array. 

One agent for detecting a protein in a sample is an antibody capable of selectively binding to 
proton. A biological sample includes tissues, cells and biological fluids isolated from a subject as 
well as tissues, cells and fluids present within a subject 
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The peptides of the present invention also provide targets for diagnosing active protein 
activity, disease, or predisposition to disease, in a patient having a variant peptide, particularly 
activities and conditions that are known for other members of the family of proteins to which the 
present one belongs. Thus, the peptide can be isolated from a biological sample and assayed for the 
presence of a genetic mutation that results in aberrant peptide. This includes amino acid 
substitution, deletion, insertion, rearrangement, (as the result of aberrant splicing events) and 
mappropriate post-translational modificatioa Analytic methods include altered electrophoretic 
mobmty, altered tryptic peptide digest, altered protease activity in cell-based or cell-free assay, 
alte^on in substrate or antibody-binding pattern, altered isoelectric point, direct amino acid ' 
sequencmg, and any other of the known assay techniques useful for detecting mutations in a proteu, 
Such an assay can be provided in a single detection format or a multi-detection format such as an 
antibody chip array. 

In vitro techniques for detection of peptide include enzyme linked immunosorbent assays 
CEUSAs), Western blots, immunoprecipitations and immunofluorescence using a detection reagent, 
such as an antibody or protein binding agent Alternatively, the peptide can be detected «vr*>ina 
subject by introducing into the subject a labeled anti-peptide antibody or other types of detection 
agent For example, the antibody can be labeled with a radioactive marker whose presence and 
locanon in a subject can be detected by standard imaging techniques. Particularly useful are 
methods that detect the allelic variant of a peptide expressed in a subject and methods which detect 
20 fragments of a peptide in a sample. 

The peptides are also useful in pr^rrnacogenomic analysis. Pharmacogenomics deal with 
chmcaUy significant hereditary variations in the response to drugs due to altered drug disposition 
and abnormal action in affected person, See, eg., Eichelbaum, M. (din. Exp. PharmacoL Physiol 
^">*«3-9 8 5 (1996)), and • 
outcomes of these variations result in severe toxicity of therapeutic drugs in certain individuals or 
therapeutic failure of drugs in ccrtinn individuals as a result of individual variation in metabolism 
Thus, the genotype of the individual can determine the way a therapeutic compound acts on the 
body or the way the body metabolizes the compound. Further, the activity of drug metabolizing 
enzymes effects both the mtendty and duration of drug action. Thus, the phannacogenomics of the 
^dividual permit the selection of effective compounds and effective dosages of such compounds for 
prophylactic or therapeutic treatment based on the individual's genotype. The discovery of genetic 
polymorphs in some drug metabolizing enzymes has explained why some patients do not obtain 
the expected drug effects, show an exaggerated drug effect or experience serious toxicity from 
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standard drug dosages. Polymorphisms can be expressed in the phenotype of the extensive 
metabolizer and the phenotype of the poor metabolizer. Accordingly, genetic polymorphism may 
lead to allelic protein variants of the protease protein in which one or more of the protease functions 
in one population is different from those in another population. The peptides thus allow a target to 

5 ascertain a genetic predisposition that can affect treatment modality. Thus, in a ligand-based 
treatment, polymorphism may give rise to amino terminal extracellular domains and/or other 
substrate-binding regions that are more or less active in substrate binding, and protease activation. 
Accordingly, substrate dosage would necessarily be modified to maximize the therapeutic effect 
within a given population containing a polymorphism. As an alternative to genotyping, specific 

) polymorphic peptides could be identified. 

The peptides are also useful for treating a disorder characterized by an absence of, 
inappropriate, or unwanted expression of the protein. Experimental data as provided in Figure 1 
indicates expression in humans in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, 
bone marrow, and in cancers. Accordingly, methods for treatment include the use of the protease 
protein or fragments. 



Antibodies 



The invention also provides antibodies that selectively bind to one of the peptides of the 
present invention, a protein comprising such a peptide, as well as variants and fragments thereof 
As used herein, an antibody selectively binds a target peptide when it binds the target peptide and 
does not significantly bind to unrelated proteins. An antibody is still considered to selectively bind 
a peptide even if it also binds to other proteins that are not substantially homologous with the target 
peptide so long as such proteins share homology with a fragment or domain of the peptide target of 
the antibody. In this case, it would be understood that antibody binding to the peptide is still 
selective despite some degree of cross-reactivity. 

As used herein, an antibody is defined in terms consistent with that recognized within the 
art: they are multi-subunit proteins produced by a mammalian organism in response to an antigen 
challenge. The antibodies of the present invention include polyclonal antibodies and monoclonal 
antibodies, as well as fragments of such antibodies, including, but not limited to, Fab or F(ab% and 
Fv fragments. 

Many methods are known for generating and/or identifying antibodies to a given target 
peptide. Several such methods are described by Harlow, Antibodies, Cold Spring Harbor Press 
(1989). 
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In general, to generate antibodies, an isolated peptide is used as an immunogen and is 
administered to a mammalian orgamsm, such as a rat, rabbit or mouse. The foil-length protem, an 
antigenic peptide fragment or a fusion protein can be used. Particularly important fragments are 
those covering functional domains, such as the domains identified in Figure 2, and domain of 
5 sequence homology or divergence amongst the family, such as those that can readily be identified 
using protein alignment methods and as presented in the Figures. 

Antibodies are preferably prepared from regions or discrete fragments of the protease 
proteins. Antibodies can be prepared from any region of the peptide as described herein 
However, preferred regions will include those involved in function/activity and/or 
10 protease^mding partner interaction. Figure 2 canbe used to identify particularly important 
regions while sequence alignment can be used to identify conserved and unique sequence 
fragments. 

An antigenic fragment will typically comprise at least 8 contiguous amino acid residues 
The anbgenicpeptide can comprise, however.at least 10, 12, 14, 16 ormore amino acid residues 
Such fragments can be selected on a physical property, such as fragments correspond to region, tha 
are located on me surface of the protein, e.g., hydrophilic regions or can be selected based on 
sequence uniqueness (see Figure 2). 

Detection on an antibody of the present invention can be facilitated by coupling (i e 
Physically linking) the antibody to a detectable substance. Examples of detectable substances 
include various enzymes, prosthetic groups, fluorescent materials, luminescent materials, 
biolummescent materials, and radioactive materials. Examples ofsuitable enzymes include 
horseradish peroxidase, alkaline phosphatase, ^galactosxdase, or acetylcholinesterase; examples of 
suitable prosthetic group complexes include streptavidhVbiotin and avidin/biotin; examples of 
suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate 
riodarnme.mcmorotriaz^lammefl 

luminescent material includes himmnl- pvmrmi-, 1 

mciUQes luminol, examples of biohnmnescent materials include luciferase, 

luaferm, and aequorin, and examples ofsuitable radioactive material include l2 \ »'t * S ot ^ 
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The antibodies can be used to xsolate one of the proteins of the present invention by standard 
t~hmo^su«*asafnm^ lUe antibodies can facilitate 

the purification of the natural protein from celk anH m-^- , 

y uvm ceils and recombmandy produced protein expressed in 

host cells. In addition, such antibodies are usefiil to ^»+«^ *t. 

m "S 6 ™ to detect the presence of one of the proteins of the 
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present invention in cells or tissues to determine the pattern of expression of the protein among 
various tissues in an organism and over the course of normal development Experimental data as 
provided in Figure 1 indicates that protease proteins of the present invention are expressed in 
humans in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, bone marrow, and in 
• cancers. Specifically, a virtual northern blot shows expression in cancers. In addition, PCR-based 
tissue screening panels indicate expression in testis, placenta, fetal lung, fetal kidney, fetal heart, 
fetal brain, and bone marrow. Further, such antibodies can be used to detect protein in situ in vitro 
or tn a cell lysate or supernatant in order to evaluate the abundance and pattern of expression Also' 
such antibodies can be used to assess abnormal tissue distribution or abnormal expression during ' 
development or progression of a biological condition. Antibody detection of circulating fragments 
of the lull length protein can be used to identify turnover. 

Further, the antibodies can be used to assess expression in disease states such as in active 
stages of the disease or in an individual with a predisposition toward disease related to the protein's 
function. When a disorder is caused by an inappropriate tissue distribution, developmental 
expression, level of expression of the protein, or expressed/processed form, the antibody can be 
prepared against the normal protein. Experimental data as provided in Figure 1 indieates expression 
m humans in testis, placenta, fetal rung, fetal kidney, fetal heart, fetal brain, bone marrow andin 
cancers. If a disorder is characterized by a specific mutation in the protein, antibodies specific for 
this mutant protein can be used to assay for the presence of the specific mutant protein. 

The antibodies can also be used to assess normal and aberrant subcellular localization of - 
cellsin the various tissues inan organism. Experimental data as provided in Figure 1 indicates 
expression in humans in testis, placenta, fetal hmg, fetal kidney, fetal heart, fetal brain, bone 
marrow, and in cancers. The diagnostic uses can be applied, not only in genetic testing, but also in 
mcnntormg a treatment modahty. Accordingly, where treatment is ultimately aimed at correcting 
expressron level or the presence of aberrant sequence and aberrant tissue distribution or 
developmental expression, antibodies directed against the protein or relevant fragments can be used 
to monitor therapeutic efficacy. 

Additionally, antibodies are useful in pharmacogenomic analysis. Thus, antibodies prepared 
agamst polymorphic proteins can be used to identify individuals that require modified treatment 
modahties. The antibodies axe also useful as diagnostic tools as an immunological marker for 
aberrant protein analyzed by electrophoretic mobility, isoelectric point, tryptic peptide digest, and 
other physical assays known to those in the art. 
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The antibodies are also useful for tissue typing. Experimental data as provided in Figure 1 
indicates expression in humans in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, 
bone marrow, and in cancers. Thus, where a specific protein has been correlated with expression in 
a specific tissue, antibodies that are specific for this protein can be used to identify a tissue type. 

The antibodies are also useful for inhibiting protein function, for example, blocking the 
binding of the protease peptide to a binding partner such as a substrate. These uses can also be 
applied in a therapeutic context in which treatment involves inhibiting the protein's function An 
antibody can be used, for example, to block binding, thus modulating (agonizing or antagonizing) 
the peptides activity. Antibodies can be prepared against specific fragments containing sites 
required for function or against intact protein that is associated with a cell or cell membrane. See 
Figure 2 for structural information relating to the proteins of the present invention 

The invention also encompasses kits for using antibodies to detect die presence of a protein 
in a biological sample. The kit can comprise antibodies such as a labeled or labelable antibody and 
a compound or agent for detecting protein in a biological sample; means for detemrining the amount 
of protein in the sample; means for comparing the amount of protein in the sample with a standard; 
and instructions for use. Such a kit can be supplied to detect a single protein or epitope or can be 
configured to detect one of a multitude of epitopes, such as in an antibody detection array. Arrays 
are described in detail below for nucleic acid arrays and similar methods have been developed for 
antibody arrays. 



Nucleic Acid Molecules 



The present invention further provides isolated nucleic acid molecules that encode a 
protease peptide or protein of foe present invention (cDNA, transcript and genomic sequence). 
Such nucleic acid molecules will consist of consist essentially of or comprise a nucleotide 
sequence that encodes one of the protease peptides of the present invention, an allelic variant 
thereof or an ortholog or paralog thereof. 

As used herein, an "isolated" nucleic acid molecule is one that is separated from other 
nucleic acid present in the natural source of the nucleic acid. Preferably, an "isolated" nucleic acid 
is free of sequences which naturally flank the nucleic acid (Le., sequences located at the 5' and 3 ' 
ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is 
derived. However, there can be some flanking nucleotide sequences, for example up to about 5KB, 
4KB, 3KB, 2KB, or 1KB or less, particularly contiguous peptide encoding sequences and peptide ' 
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encoding sequences within the same gene but separated by introns in the genomic sequence The 
important point is that the nucldc acid as isolated from remote and ununportant flankmg sequences 
such that it can be subjected to the specific manipulations described herein such as recombinant 
expression, preparation of probes and primers, and other uses specific to the nucleic acid sequences 
5 Moreover, an "isolated" nucleic acid molecule, such as a transcript/cDNA molecule can be 

substantially free of other cellular material, or culture medium when produced by recombinant 
techniques, or chemical precursors or other chemicals when chemicaliy synthestzed. However the 
nucleic acid molecule can be fused to other coding or regulatory sequences and still be considered 
isolated 

1 0 For example, recombinant DNA molecules contained in a vector are considered isolated. 

Further examples of isolated DNA molecules include recombinant DNA molecules maintained in 
heterologous host cells or purified (partially or substantially) DNA molecules in solution. Isolated 
RNA molecules include in vivo or in vitro RNA transcripts of the isolated DNA molecules of the 
present invention. Isolated nucleic acid molecules according to the present invention further include 

15 such molecules produced synthetically. 

Accordingly, the present invention provides nucleic acid molecules that consist of the 
nucleotide sequence shown in Figure 1 or 3 (SEQ ID NO: 1, transcript sequence and SEQ ID NO"3 
genomic sequence), or any nucleic acid molecule that encodes the protein provided in Figure 2 
SEQ ID NO:2. A nucleic acid molecule consists of a nucleotide sequence when the nucleotide' 
sequence is the complete nucleotide sequence of the nucleic acid molecule. 

Tlie present invention further provides nucleic acid molecules that consist essentially of the 
nucleotide sequence shown in Figure 1 or 3 (SEQ ID NO: 1, transcript sequence and SEQ ID N0 3 
genonuc sequence), or any nucleic acid molecule that encodes the protein provided in Figure 2, 
SEQ ID NO* A nucleic acid molecule consists essentially of a nucleotide sequence when such a 
nucleotide sequence is present with only a few additional nucleic acid residues in the final nucleic 
acid molecule. 

The present invention further provides nucleic acid molecules that comprise the nucleotide 
sequences shown in Figure 1 or 3 (SEQ ID NO: 1, transcript sequence and SEQ ID NO:3, genomic 
sequence), or any nucleic acid molecule that encodes the protein provided in Figure 2, SEQ ID 
NO:2. A nucleic acid molecule comprises a nucleotide sequence when the nucleotide sequence is at 
least part of the final nucleotide sequence of the nucleic acid molecule. In such a fashion, the 
nucleic acid molecule can be only the nucleotide sequence or have additional nucleic acid residues 
such as nucleic acid residues that are naturally associated with it or heterologous nucleotide 
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sequences. Such a nucleic acid molecule can have a few additional nucleotides or can comprises 
several hundred or more additional nucleotides. A brief description of how various types of these 
nucleic acid molecules can be readily made/isolated is provided below. 

In Figures 1 and 3, both coding and non-coding sequences are provided. Because of the 
> source of the present invention, humans genomic sequence (Figure 3) and cDNA/transcript 
sequences (Figure 1), the nucleic acid molecules in the Figures will contain genomic intronic 
sequences, 5 • and 3 ' non-coding sequences, gene regulatory regions and non-coding intergenic 
sequences. In general such sequence features are either noted in Figures 1 and 3 or can readily 
be identified using computational tools known in the art As discussed below, some of the non- 
coding regions, particularly gene regulatory elements such as promoters, are useful for a variety 
of purposes, e.g. control of heterologous gene expression, target for identifying gene activity 
modulating compounds, and are particularly claimed as fragments of the genomic sequence 
provided herein. 

The isolated nucleic acid molecules can encode the mature protein plus additional amino or 
carboxyl-tennmal amino acids, or amino acids interior to the mature peptide (when the mature form 
has more than one peptide chain, for instance). Such sequences may play a role in processing of a 
pxotein from precursor to a mature form, facilitate protein trafficking, prolong or shorten protein 
half-hfe or facilitate manipulation of a protein for assay or production, among other things As 
generally is the case /* s*u, the additional amino acids may be processed away from the mature 
protein by cellular enzymes. 

As mentioned above, the isolated nucleic acid molecules include, but are not limited to the 
sequence encoding the protease peptide alone, the sequence encoding the mature peptide and 
admtional coding seouences, such as a leader or secretory sequence (e.g., apre-pm or pro-protein 
sequence), the sequence encoding the mature peptide, with or without the additional coding 
sequences, plus additional non-coding sequences, for example introns and non^oding 5' and 3' 
sequences such as transcribed but non-translated sequences that play a role in transcription, mRNA 
processing (mcludmg splicing and polyadenylation signals), ribosome binding and stability of 
mRNA In addition, the nucleic acid molecule may be fused to a marker sequence encoding, for 
example, a peptide that facilitates purification. 

Isolated nucleic acid molecules can be in the form ofKNA, such as mRNA or in the form 
DMA, mcludmg cDNA and genomic DNA obtained by cloning or produced by chemical synthetic 
techmquesorbyacombination thereof The nucleic acid, especially DNA, can be double-stranded 
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or single-stranded Single-stranded nucleic arid ~m u~*u~ ~* * , 

nUCieiC ^ can be the coding strand (sense strand) or the non- 
coding strand (anti-sense strand). 

. IT* invention further provides nucleic acid molecules that encode foments of the peptides 

of the present invention as well as nucleic acid molecules that encode obvious variants of the 
protease proteins of the present invention that are described above. Such nucleic acid molecules 
may be naturally occurring, such as allelic variants (same locus), paralogs (different locus) and 
orthologs (different organism), or may be constructed by recombinant DNA methods or by' 

chemical synthesis. Such non-naturaDy occurring variants may be made by mutagenesis 
techmoues^ 

■ a^^mevanantsc^ 

mserhons. V^on can occur meimero^^ Tne variations 

can produce both conservative and non^nservative amino acid substitutions 

Tne present invention further provides non-coding fragments of the nucleic acid molecules 
pxovrdedmRgures land3. Preferred non^oding fragments include, but are not limited to 
promoter sequences, enhancer sequences, gene modulating sequences and gene termination' 
sequences. Such fragments are useful in controlling heterologous gene expression and in 
develop^ screens to identify gene^odulating agent, A promoter can readily be identified as 
bemgS to the ATG start site in the genomic sequence provided in Figure 3. 

A fragment comprises a contiguous nucleotide sequence greater than 12 or more 
nucleotides. Further, a fragment could at least 30, 40, 50, 1 00, 250 or 500 nucleotides in length. 

I^!^ fce ^^^ bebaSed0nitSinteaded,1Se - For «^, me fragment can encode 
epitope bearmg regions of the peptide, or can be useful as DNA probes and primer, Such 

fragments can be isolated using the known nucleotide sequence to synthesize an oligonucleotide 
probe A labeled probe can then be used to screen a cDNA library, genomic DNA horary or 

vu< reactions to clone specific regions of gene. 

Aprobeqorimer typically comprises substantially a purified oligonucleotide or 
ohgonucleotidepan, Tne oligonucleotide typically comprises a region of nucleotide sequence that 
hybnAzes under stringent conditions to at least about 12, 20, 25, 40, 50 or more consecutive 
nucleotides. 

Orthologs, homologs, and allelic variants can be identified ,™ » 1 

, i j gemmed using methods well known in the 

MK. 7^80%, ^90%, and morc typi^ MkastaxM ^ % „ 
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more homologous to the nucleotide sequence shown in the Figure sheets or a fragment of this 
sequence. Such nucleic acid molecules can readily be identified as being able to hybridize under 
moderate to stringent conditions, to the nucleotide sequence shown in the Figure sheets or a 
fragment of the sequence. Allelic variants can readily be determined by genetic locus of the 



5 encoding gene. 

As used herein, the term ,f hybridizes under stringent conditions" is intended to describe 
conditions for hybridization and washing under which nucleotide sequences encoding a peptide at 
least 60-70% homologous to each other typically remain hybridized to each other. The conditions 

10 can be such that sequences at least about 60%, at least about 70%, or at least about 80% or more 
homologous to each other typically remain hybridized to each other. Such stringent conditions are 
known to those skilled in the art and can be found in Current Protocols in Molecular Biology, John 
Wiley & Sons, N.Y. (1989), 63.1-63.6. One example of stringent hybridization conditions are 
hybridization in 6X sodium chloride/sodium citrate (SSC) at about 45C, followed by one or more 

15 washes in 0.2 X SSC, 0.1% SDS at 50-65C. Examples of moderate to low stringency hybridization 
conditions are well known in the art 

Nucleic Acid Molecule Uses 

The nucleic acid molecules of the present invention are useful for probes, primers, chemical 
20 intermediates, and in biological assays. The nucleic acid molecules are useful as a hybridization 
probe for messenger RNA, transcript/cDNA and genomic DNA to isolate full-length cDNA and 
genomic clones encoding the peptide described in Figure 2 and to isolate cDNA and genomic 
clones that correspond to variants (alleles, orthologs, etc.) producing the same or related peptides 
shown in Figure 2. As indicated in Figure 3, SNPs, including insertion/deletion polymorphisms 
25 ("indels"), were identified at 69 different nucleotide positions in and around the gene encoding the 
transporter protein of the present invention. 



molecules provided in the Figures. Accordingly, it could be derived from 5 7 noncoding regions, the 



The nucleic acid molecules are also useful as primers for PCR to amplify any given region 
of a nucleic acid molecule and are useful to synthesize antisense molecules of desired length and 



The probe can correspond to any sequence along the entire length of the nucleic acid 
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sequence. 
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The nucleic acid molecules are also useful for constructing recombinant vectors. Such 
vectors include expression vectors that express a portion of, or all of; the peptide sequences. 
Vectors also include insertion vectors, used to integrate into another nucleic acid molecule 
sequence, such as into the cellular genome, to altering expression of a gene and/or gene product 
For example, an endogenous coding sequence can be replaced via homologous recombination with 
all or part of the coding region containing one or more specifically introduced mutations. 

The nucleic acid molecules are also useful for expressing antigenic portions of the proteins. 
The nucleic acid molecules are also useful as probes for determining the chromosomal 
positions of the nucleic acid molecules by means of in situ hybridization methods. The gene 
provided by the present invention is located on a genome component that has been mapped to 
human chromosome 4 (as indicated in Figure 3), which is supported by multiple lines of evidence, 
such as STS and BAC map data. 

The nucleic acid molecules are also useful in making vectors containing the gene regulatory 
regions of the nucleic acid molecules of the present invention. 

The nucleic acid molecules are also useful for designing ribozymes conxsponding to all, or 
a part, of the mRNA produced from the nucleic acid molecules described herein. 

The nucleic acid molecules are also useful for making vectors that express part, or all, of the 
peptides. 

The nucleic acid molecules are also useful for constructing host cells expressing a part, or 
all, of the nucleic acid molecules and peptides. 

The nucleic acid molecules are also useful for constructing transgenic animals expressing 
all, or a part, of the nucleic acid molecules and peptides. 

The nucleic acid molecules are also useful as hybridization probes for detennining the 
presence, level, form and distribution of nucleic acid expression. Experimental data as provided in 
Figure 1 indicates that protease proteins of the present invention are expressed in humans in testis, 
placenta, fetal lung, fetal kidney, fetal heart, fetal brain, bone marrow, and in cancers. Specifically, a 
virtual northern blot shows expression in cancers. In addition, PCR-based tissue screening panels 
indicate expression in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, and bone 
marrow. Accordingly, the probes can be used to detect the presence of, or to determine levels of a 
specific nucleic acid molecule in cells, tissues, and in organisms. The nucleic acid whose level is 
determined can be DNA or RNA Accordingly, probes corresponding to the peptides described 
herein can be used to assess expression and/or gene copy number in a given cell, tissue, or 
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organism. These uses are relevant for diagnosis of disorders involving an increase or decrease in 
protease protein expression relative to normal results. 

/* vitro techniques for detection of mRNA include Northern hybridizations and in situ 
hybridizations. Tn vitro techniques for detecting DNA includes Southern hybridizations and in situ 
hybridization. 

Probes can be used as a part of a diagnostic test kit for identifying cells or tissues that 
express a protease protein, such as by measuring a level of a protease-encoding nucleic acid in a 
sample of cells from a subject e.g., mRNA or genomic DNA, or determining if a protease gene has 
been mutated. Experimental data as provided in Figure 1 indicates that protease proteins of the 
present invention are expressed in humans in testis, placenta, fetal lung, fetal kidney, fetal heart, 
fetal brain, bone marrow, and in cancers. Specifically, a virtual northern blot shows expression in 
cancers, hi addition, PCR-based tissue screening panels indicate expression in testis, placenta, fetal 
lung, fetal kidney, fetal heart, fetal brain, and bone marrow. 

Nucleic acid expression assays are useful for drug screening to identify compounds that 
modulate protease nucleic acid expression. 

The invention thus provides a method for identifying a compound that can be used to treat a 
disorder associated with nucleic acid expression of the protease gene, particularly biological and 
pathological processes that are mediated by the protease in cells and tissues that express it 
Experimental data as provided in Figure 1 indicates expression in humans in testis, placenta, fetal 
lung, fetal kidney, fetal heart, fetal brain, bone marrow, and in cancers. The method typically 
includes assaying the ability of the compound to modulate the expression of the protease nucleic 
acid and thus identifying a compound that can be used to treat a disorder characterized by undesired 
protease nucleic acid expression. The assays can be performed in cell-based and cell-free systems. 
Cell-based assays include cells naturally expressing the protease nucleic acid or recombinant cells 
genetically engineered to express specific nucleic acid sequences. 

The assay for protease nucleic acid expression can involve direct assay of nucleic acid 
levels, such as mRNA levels, or on collateral compounds involved in the signal pathway. Further, 
the expression of genes that are up- or down-regulated in response to the protease protein signal ' 
pathway can also be assayed, m this embodiment the regulatory regions of these genes can be 
operably linked to a reporter gene such as hiciferase. 

Thus, modulators of protease gene expression can be identified in a method wherein a cell is 
contacted with a candidate compound and the expression of mRNA determined. The level of 
expression of protease mRNA in the presence of the candidate compound is compared to the level 
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of expression of protease mRNA in the absence of the candidate compound. The candidate 
compound can then be identified as a modulator of nucleic acid expression based on this 
comparison and be used, for example to treat a disorder characterized by aberrant nucleic acid 
expression. When expression of mRNA is statistically significantly greater in the presence of the 
' candidate compound than in its absence, the candidate compound is identified as a stimulator of 
nucleic acid expression. When nucleic acid expression is statistically significantly less in the 
presence of the candidate compound than in its absence, the candidate compound is identified as an 
inhibitor of nucleic acid expression 

The invention further provides methods of treatment, with the nucleic acid as a target, using 
a compound identified through drug screening as a gene modulator to modulate protease nucleic 
acid expression in cells and tissues that express the protease. Experimental data as provided in 
Figure 1 indicates that protease proteins of the present invention are expressed in humans in testis, 
placenta, fetal lung, fetal kidney, fetal heart, fetal brain, bone marrow, and in cancers. Specifically a 
virtual northern blot shows expression in cancers. In addition, PCR-based tissue screening panels 
mdicate expression in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, and bone 
marrow. Modulation includes both up-regulation (i.e. activation or agonization) or down-regulation 
(suppression or antagonization) or nucleic acid expression 

Alternatively, a modulator for protease nucleic acid expression can be a small molecule or 
drug identified using the screening assays described herein as long as the drug or small molecule 
inhibits the protease nucleic acid expression in the cells and tissues that express the protein 
Experimental data as provided in Figure 1 indicates expression in humans in testis, placenta, fetal 
lung, fetal kidney, fetal heart, fetal brain, bone marrow, and in cancers. 

The nucleic acid molecules are also useful for monitoring the effectiveness ofmodulating 
compounds on the expression or activity of the protease gene in clinical trials or in a treatment 
regimen. Thus, the gene expression pattern can serve as a barometer for the continuing 
effectiveness of treatment with the compound, particularly with compounds to which a patient can 
develop resistance. The gene expression pattern can also serve as a marker indicative of a 
physiological response of the affected cells to the compound. Accordingly, such monitoring would 
allow either increased administration of the compound or the adnimistration of alternative 
compounds to which the patient has not become resistant Similarly, if the level of nucleic acid 
expression falls below a desirable level, administration of the compound could be commensurately 
decreased. 
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The nucleic acid molecules axe also useful in diagnostic assays for qualitative changes in 
protease nucleic acid expression, and particularly in qualitative changes that lead to pathology The 
nucleic acid molecules can be used to detect mutations in protease genes and gene expression 
products such as mRNA The nucleic acid molecules can be used as hybridization probes to detect 
• naturally occurring genetic mutations in the protease gene and thereby to determine whether a 
subject with the mutation is at risk for a disorder caused by the mutation. Mutations include 
deletion, addition, or substitution of one or more nucleotides in the gene, chromosomal 
rearrangement, such as inversion or transposition, modification of genomic DNA, such as aberrant 
methylation patterns or changes in gene copy number, such as amplification. Detection of a 
mutated form of the protease gene associated with a dysfunction provides a diagnostic tool for an 
active disease or susceptibility to disease when the disease results from overexpression, 
undo expression, or altered expression of a protease protein. 

Individuals carrying mutations in the protease gene can be detected at the nucleic acid level 
by a variety of techniques-Figure 3 provides information on SNPs that have been identified in the 
gene encoding the protease protein of the present invention. SNPs, including indels (indicated by a 

were identified at 69 different nucleotide positions. Non-synonymous cSNPs were identified at 
positron 30496. The changes in the amino acid sequence caused by these SNPs is indicated in 
Figure 3 and can readily be determined using the universal genetic code and the protein sequence 
provided in Figure 2 as a reference. SNPs outside the ORF and in introns may affect 
conciliatory ™ e *» Prided by the present invention is located on a genome 

component that has been mapped to human chromosome 4 (as indicated in Figure 3), which is 
supported by multiple lines of evidence, such as STS and B AC map data. Genomic DNA can be 
analyzed direcdy or can be amplified by using PGR prior to analysis. KNA or cDNA can be used in 
Ihe same way. In some uses, detection of the mutation involves the use of a probe/primer in a 
polymerase chainreaction (PGR) (see, e.g. U.S. PatentNos. 4,683,195 and 4,683,202), such as 
anchor PGR or RACE PGR, or, alternatively, in a ligation chain reaction (ICR) (see, eg., 
Landegran et aL, Science 247:1077-1080 (1988); and Nakazawa et aL, PNAS 97:360-364 (1994)) 
the latter of which can be particularly useful for detecting point mutations in the gene (see Abravaya 
et aL, Nucleic Acid, Res. 23:675-682 (1995)). This method can include the steps of collecting a 
sample of cells from a patient, isolating nucleic acid (e.g., genomic, mRNA or both) from the cells 
of the sample, contacting the nucleic acid sample with one or more primers which specifically 
hybndize to a gene under conditions such that hybridization and amplification of the gene (if 
present) occurs, and detecting the presence or absence of an amplification product, or detecting the 
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size of the amplification product and comparing the length to a control sample. Deletions and 
insertions can be detected by a change in size of the amplified product compared to the normal 
genotype. Point mutations can be identified by hybridizing amplified DNA to normal RNA or 
antisense DNA sequences. 

5 Alternatively, mutations in a protease gene can be directly identified, for example, by 

alterations in restriction enzyme digestion patterns detennined by gel electrophoresis. 

Further, sequence-specific ribozymes (U.S . Patent No. 5,498,53 1) can be used to score for 
the presence of specific mutations by development or loss of a ribozyme cleavage site. Perfectly 
matched sequences can be distinguished from mismatched sequences by nuclease cleavage 
) digestion assays or by differences in melting temperature. 

Sequence changes at specific locations can also be assessed by nuclease protection assays 
such as RNase and SI protection or the chenhcal cleavage method. Furthermore, sequence 
differences between a mutant protease gene and a wild-type gene can be determined by direct DNA 
sequencing. A variety of automated sequencing procedures can be utilized when performing the 
diagnostic assays (Naeve, C.W., (1995) Biotechniques 19:448), including sequencing by mass 
spectrometry (see, e.g., PCT International Publication No. WO 94/16101; Cohen etaL,Adv. 
Chromatogr. 3*127-162 (1996); and Griffin etaL,AppL Biochem. Biotechnol 35:147-159 (1993)). 

Other methods for detecting mutations in the gene include methods in which protection 
fiom cleavage agents is used to detect mismatched bases in RNA/RNA or RNA/DNA duplexes 
(Myers et aL, Science 230:1242 (1985)); Cotton et aL, PNAS 55:4397 (1988); Saleeba etaL Meth. 
Enzymol 277:286-295 (1992)), electrophoretic mobility of mutant and wild type nucleic acid is 
compared (Orita et al., PNAS 862766 (1989); Cotton et aL, Mutax. Res. 255:125-144 (1993); and 
Hayashi et aL, Genet Anal. Tech. AppL 9:73-79 (1992)), and movement of mutant or wild-type 
fragments in polyacrylamide gels containing a gradient of denaturant is assayed using denaturing 
gradient gel electrophoresis (Myers et al., Nature 313:495 (1985)). Examples of other techniques 
for detecting point mutations include selective oligonucleotide hybridization, selective 
amplification, and selective primer extension. 

The nucleic acid molecules are also useful for testing an individual for a genotype that while 
not necessarily causing the disease, nevertheless affects the treatment modality. Thus, the nucleic 
acid molecules can be used to study the relationship between an individual's genotype and the 
individual's response to a compound used for treatment (pharmacogenomic relationship). 
Accordingly, the nucleic acid molecules described herein can be used to assess the mutation content 
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of the protease gene in an individual in order to select an appropriate compound or dosage regimen 
for treatment 

Thus nucleic acid molecules displaying genetic variations that affect treatment provide a 
diagnostic target that can be used to tailor treatment in an individual.. Accordingly, the production 
. of recombinant cells and animals containing these polymorphisms allow effective clinical design of 
treatment compounds and dosage regimens. 

The nucleic acid molecules are thus useful as antisense constructs to control protease gene 
expression in cells, tissues, and organisms. A DNA antisense nucleic acid molecule is designed to 
be complementary to a region of the gene involved in transcription, preventing transcription and 
hence production of protease protein. An antisense RNA or DNA nucleic acid molecule would 
hybridize to the mRNA and thus block translation ofmRNA into protease protein. Figure 3 
provides information on SNPs mat have been identified in the gene encoding the protease protein of 
the present invention. SNPs, including indels (indicated by a "-«% were identified at 69 different 
nucleotide positions. Non-synonymous cSNPs were identified at position 30496. The changes in the 
ammo acid sequence caused by these SNPs is indicated in Figure 3 and can readily be determined 
usmg the universal genetic code and the protein sequence provided in Figure 2 as a reference SNPs 
outside the ORF and in nitrons may affect control/regulatory elements. 

Alternatively, a class of antisense molecules can be used to inactivate mRNA in order to 
decrease expression of protease nucleic acid. Accordingly, these molecules can treat a disorder 
characterized by abnormal or undesired protease nucleic acid expression. This technique involves 
cleavage by means of ribozymes containing nucleotide sequences complementary to one or more 
regKms in the mRNA that attenuate the ability of the mRNA to be translated. Possible regions 
mclude coding regions and particularly coding regions corresponding to the catalytic and other 
functional activities of the protease protein, such as substrate binding. 

The nucleic acid molecules also provide vectors for gene therapy in patients containing cells 
that are aberrantmprotease gene expression. Thus, recombinant cells, which include the patient's 
cells that have been engineered « ™ and returned to the patient, are mtroduced into an individual 
where the cells produce the desired protease protein to treat the individual. 

The invention also encompasses kits for detecting the presence of a protease nucleic acid in 
a biological sample. Experimental data as prodded in Figure 1 mdicates that protease proteins of 
the present invention are expressed in humans in testis, placenta, fetal lung, fetal kidney, fetal heart, 
fetal brain, bone marrow, and in cancers. Specifically, a virtual northern blot shows expression in 
cancer,. In addition, PCR-based tissue screening panels indicate expression in testis, placenta, fetal 
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lung, fetal kidney, fetal heart, fetal brain, and bone marrow. For example, the kit can comprise 
reagents such as a labeled or labelable nucleic acid or agent capable of detecting protease nucleic 
acid in a biological sample; means for deterrnining the amount of protease nucleic acid in the 
sample; and means for comparing the amount of protease nucleic acid in the sample with a standard. 
. The compound or agent can be packaged in a suitable container. The kit can further comprise 
instructions for using the kit to detect protease protein mRNA or DNA. 

Nucleic Acid Array ?; 

The present invention further provides nucleic acid detection kits, such as arrays or 
microarrays of nucleic acid molecules that are based on the sequence information provided in 
Figures 1 and 3 (SEQ ID NOS:l and 3). 

As used herein "Arrays" or "Microarrays" refers to an array of distinct polynucleotides or 
ohgonucleotides synthesized on a substrate, such as paper, nylon or other type of membrane 
filter, chip, glass slide, or any other suitable solid support. In one embodiment, the microarray is 
prepared and used according to the methods described in US Patent 5,837,832, Chee et al PCT 
application W095/1 1995 (Chee et al.), Ix>ckhart, D. J. et al. (1996; Nat. Biotech. 14: 1675-1680) 
and Schena, M. et al. (1996; Proc. Natl. Acad. Sci. 93: 10614-10619), all of which are 
incorporated herein in their entirety by reference. In other embodiments, such arrays are 
produced by the methods described by Brown et al., US Patent No. 5,807,522. 

The microarray or detection kit is preferably composed of a large number of unique, 
single-stranded nucleic acid sequences, usually either synthetic antisense oligonucleotides or 
fragments of cDNAs, fixed to a solid support. The oligonucleotides are preferably about 6^0 
nucleondes in length, more preferably 15-30 nucleotides in length, and most preferably about 20- 
25 nucleotides in length. For a certain type of microarray or detection kit, it may be preferable to 
use ohgonucleotides that are only 7-20 nucleotides in length. The microarray or detection kit 
may contain oligonucleotides that cover the known 5', or 3', sequence, sequential 
ohgonucleotides which cover the foil length sequence; or unique ohgonucleotides selected from 
particular areas along the length of the sequence. Polynucleotides used in the microarray or 
detection kit maybe ohgonucleotides that are specific to a gene or genes of interest 

In order to produce ohgonucleotides to a known sequence for a microarray or detection 
kit, the gene(s) of interest (or an ORF identified from the contigs of the present invention) is 
typically examined using a computer algorithm which starts at the 5' or at the 3' end of the 
nucleotide sequence. Typical algorithms will then identify oligomers of defined length that are 
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unique to the gene, have a GC content within a range suitable for hybridization, and lack 
predicted secondary structure that may interfere with hybridization. In certain situations it may 
be appropriate to use pairs of oligonucleotides on a microanay or detection kit. The "pairs" will 
be identical, except for one nucleotide that preferably is located in the center of the sequence. 
The second oligonucleotide in the pair (mismatched by one) serves as a control. The number of 
oligonucleotide pairs may range from two to one million. The oligomers are synthesized at 
designated areas on a substrate using a hght-directed chemical process. The substrate may be 
paper, nylon or other type of membrane, filter, chip, glass slide or any other suitable solid 



support 

) 



In another aspect, an oligonucleotide may be synthesized on the surface of the substrate 
by using a chemical coupling procedure and an ink jet application apparatus, as described in PCT 
application W095/251116 (Baldeschweiler *r ol) which is incorporated herein in its entirety by 
reference. In another aspect, a "gridded" array analogous to a dot (or slot) blot may be used to 
arrange and link cDNA fragments or oligonucleotides to the surface of a substrate using a 
vacuum system, thermal, UV, mechanical or chemical bonding procedures. An array, such as 
those described above, may be produced by hand or by using available devices (slot blot or dot 
blot apparatus), materials (any suitable solid support), and machines (including robotic 
instruments), and may contain 8, 24, 96, 384, 1536, 6144 or moxe oligonucleotides, or any other 
number between two and one million which lends itself to the efficient use of commercially 
available instrumentation. 

In order to conduct sample analysis using a microarray or detection kit, the RNA or DNA 
from a biological sample is made into hybridization probes. The mRNA is isolated, and cDNA is 
produced and used as a template to make antisense RNA (aRNA). The aRNA is amplified in the 
presence of fluorescent nucleotides, and labeled probes are incubated with the microarray or 
detection kit so that the probe sequences hybridize to complementary oligonucleotides of the 
microarray or detection kit Incubation conditions are adjusted so that hybridization occurs with 
precise complementary matches or with various degrees of less complementarity. After removal 
of nonhybridized probes, a scanner is used to determine the levels and patterns of fluorescence. 
The scanned images are examined to determine degree of complementarity and the relative 
abundance of each ohgonucleotide sequence on the microarray or detection kit The biological 
samples may be obtained from any bodily fluids (such as blood, urine, saliva, phlegm, gastric 
juices, etc.), cultured cells, biopsies, or other tissue preparations. A detection system may be 
used to measure the absence, presence, and amount of hybridization for all of the distinct 
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sequences simultaneously. This data may be used for large-scale correlation studies on the 
sequences, expression patterns, mutations, variants, or polymorphisms among samples. 

Using such arrays, the present invention provides methods to identify the expression of 
the protease proteins/peptides of the present invention. In detail, such methods comprise 
i incubating a test sample with one or more nucleic acid molecules and assaying for binding of the 
nuclei acid molecule with components within the test sample. Such assays will typically 
involve arrays comprising many genes, at least one of which is a gene of the present invention 
and or alleles of the protease gene of the present invention. Figure 3 provides information on 
SNPs that have been identified in the gene encoding the protease protein of the present 
invention. SNPs, including indels (indicated by a «-"), were identified at 69 different nucleotide 
positions. Non-synonymous cSNPs were identified at position 30496. The changes in the amino 
acid sequence caused by these SNPs is indicated in Figure 3 and can readily be determined using 
the universal genetic code and the protein sequence provided in Figure 2 as a reference. SNPs 
outside the ORF and in introns may affect control/regulatory elements. 

Conditions for incubating a nucleic acid molecule with a test sample vary. Incubation 
conditions depend on the format employed in the assay, the detection methods employed, and the 
type and nature of the nucleic acid molecule used in the assay. One skilled in the art will 
recognize that any one of the commonly available hybridization, amplification or array assay 
formats can readily be adapted to employ the novel fragments of the Human genome disclosed 
herein. Examples of such assays can be found in Chard, T, An Introduction to 
Radioimmunoassay and Related Techniques. Elsevier Science Publishers, Amsterdam, The 
Netherlands (1986); Bullock, G. R_ etal.. Techniques in Immunocytochemvtry, Academic 
Press, Orlando, FL Vol. 1(1 982), Vol. 2(1983),VoL 3 (1985); Tijssen, P., Practice and 
Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry and Molecular 
Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1985). 

The test samples of the present invention include cells, protein or membrane extracts of 
cells. The test sample used in the above-described method will vary based on the assay format, 
nature of the detection method and the tissues, cells or extracts used as the sample to be assayed. 
Methods for preparing nucleic acid extracts or of cells are well known in the art and can be 
readily be adapted in order to obtain a sample that is compatible with the system utilized. 

In another embodiment of the present invention, kits are provided which contain the 
necessary reagents to carry out the assays of the present invention. 
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Specifically, the invention provides a compartmentalized kit to receive, in close 
confinement, one or more containers which comprises: (a) a first container comprising one of the 
nucleic acid molecules that can bind to a fragment of the Human genome disclosed herein; and 
(b) one or more other containers comprising one or more of the following: wash reagents, 
reagents capable of detecting presence of a bound nucleic acid. 

m ***** a compartmentalized kit includes any kit in which reagents are contained in 
separate containers. Such containers include small glass containers, plastic containers, strips of 
plastic, glass or paper, or arraying material such as silica. Such containers allows one to 
efficiently transfer reagents from one compartment to another compartment such that the 
samples and reagents are not cross-contaminated, and the agents or solutions of each container 
can be added in a quantitative fashion from one compartment to another. Such containers will 
include a container which will accept the test sample, a container which contains the nucleic acid 
probe, containers which contain wash reagents (such as phosphate buffered saline, Tris-buffers, 
etc.), and containers which contain the reagents used to detect the bound probe. One skilled in 
the art will readily recognize that the previously unidentified protease gene of the present 
invention can be routinely identified using the sequence information disclosed herein can be 
readily incorporated into one of the established kit formats which are well known in the art, 
particularly expression arrays. 



Vectors/host cells 

The invention also provides vectors containing the nucleic acid molecules described herein. 
The term "vector" refers to a vehicle, preferably a nucleic acid molecule, which can transport the 
nucleic acid molecules. When the vector is a nucleic acid molecule, the nucleic acid molecules are 
covalentiy linked to the vector nucleic acid. With this aspect of the invention, the vector includes a 
plasmid, single or double stranded phage, a single or double stranded KNA orDNA viral vector, or 
artificial chromosome, such as a BAC, PAC, YAC, OR MAC. 

A vector can be maintained in the host cell as an extrachromosomal element where it 
replicates and produces additional copies of the nucleic acid molecules. Alternatively, the vector 
may integrate into the host cell genome and produce additional copies of the nucleic acid molecules 
when the host cell replicates. 

The invention provides vectors for the maintenance (cloning vectors) or vectors for 
expression (expression vectors) of the nucleic acid molecules. The vectors can function in 
prokaryotic or eukaryotic cells or in both (shuttle vectors). 
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Expression vectors contain cis-acting regulatory regions that are operably linked in the 
vector to the nucleic acid molecules such that transcription of the nucleic acid molecules is allowed 
in a host cell. The nucleic acid molecules can be introduced into the host cell with a separate 
nucleic acid molecule capable of affecting transcription. Thus, the second nucleic acid molecule 
■ may provide a trans-acting factor interacting with the cis-regulatory control region to allow 

transcription of the nucleic acid molecules from the vector. Ahematively, a trans-acting factor may 
besupphedbythehostcell. Finally, a trans-acting factor can be produced from the vector itself. It 
is understood, however, that in some embodiments, transcription and/or translation of the nucleic 
acid molecules can occur in a cell-free system. 

The regulatory sequence to which the nucleic acid molecules described herein can be 
operably linked include promoters for directing mRNA transcription. These include, but are not 
limited to, the left promoter from bacteriophage X, the lac, TRP, and TAC promoters from E coli 
the early and late promoters from SV40, the CMV immediate early promoter, the adenovirus early 
and late promoters, and retrovirus long-tenninal repeats. 

In addition to control regions that promote transcription, expression vectors may also 
include regions that modulate transcription, such as repressor binding sites and enhancers. 
Examples include the SV40 enhancer, the cytomegalovirus immediate early enhancer, polyoma 
enhancer, adenovirus enhancers, and retrovirus LTR enhancers. 

In addition to containing sites for transcription initiation and control, expression vectors can 
also contain sequences necessary for transcription termination and, in the transcribed region a 
ribosome binding site for translation. Other regulatory control elements for expression include 
mrtration and termination codons as well as polyadenylation signals. The person of ordinary skill in 
the art would be aware of the numerous regulatory sequences that are useful in expression vectors. 
Such regulatory sequences are described, for example, in Sambrook et al , Molecular Cloning: A 
Laboratory Manual. 2nd. ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor NY 
(1989). ' ' 

A variety of expression vectors can be used to express a nucleic acid molecule. Such 
vectors include chromosomal, episomaL and virus^erived vectors, for example vectors derived 
from bacterial plasmids, from bacteriophage, from yeast episomes, from yeast chromosomal 
elements, including yeast artificial chromosomes, from viruses such as baculoviruses, 
papovaviruses such as SV40, Vaccinia viruses, adenoviruses, poxviruses, pseudorabies viruses and 
retrovmises. Vectors may also be derived from combinations of these sources such as those derived 
from plasmid and bacteriophage genetic elements, e.g. cosmids and phagemids. Appropriate 
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cloning and expression vectors for piokaryotic and eukaryotic hosts are described in Sambrook et 
aL , Molecular Cloning: A Laboratory Manual. 2nd. ed., Cold Spring Harbor Laboratory Press, Cold 
Spring Haibor, NY, (1989). 

The regulatory sequence may provide constitutive expression in one or more host cells (Le 
tissue specific) or may provide for inducible expression in one or more cell types such as by 
temperature, nutrient additive, or exogenous factor such as a hormone or other ligand A variety of 
vectors providing for constitutive and inducible expression in prokaryotic and eukaryotic hosts are 
well known to those of ordinary skill in the art 

The nucleic acid molecules can be inserted into the vector nucleic acid by well-known 
methodology. Generally, the DNA sequence that will ultimately be expressed is joined to an 
expression vector by cleaving the DNA sequence and the expression vector with one or more 
restriction enzymes and then Hgaring the fragments together. Procedures for restriction enzyme 
digestion and ligation are well known to those of ordinary skill in the art 

The vector containing the appropriate nucleic acid molecule can be introduced into an 
appropriate host cell for propagation or expression using well-known techniques. Bacterial cells 
include, but are not limited to, E. coli, Streptomyces, and Salmonella typhimurium. Eukaryotic cells 
mclude, but are not limited to, yeast, insect cells such as Drasophila, animal cells such as COS and 
CHO cells, and plant cells. 

As described herein, it may be desirable to express the peptide as a fusion protein. 
Accordingly, the invention provides fusion vectors that allow for the production of the peptides. 
Fusion vectors can increase the expression of a recombinant protein, increase the solubility of the 
recombinant protein, and aid in the purification of the protein by acting for example as a ligand for 
affinity purification. A proteolytic cleavage site may be introduced at the junction of the fusion 
moiety so that foe desired peptide can ultimately be separated from the fusion moiety. Proteolytic 
enzymes include, but are not limited to, factor Xa, thrombin, and enteroprotease. Typical fusion 
expression vectors include pGEX (Smith etaL,Gene 67:31-40 (1988)), pMAL (New England 
Biolabs, Beverly, MA) and pRIT5 (Pharmacia, Piscataway, NJ) which fuse glutathione S- 
transferase (GST), maltose E binding protein, or protein A, respectively, to foe target recombinant 
protein. Examples of suitable inducible non-fusion E coli expression vectors include pTrc (Amann 
etal,Gene 59:301-315 (1988)) andpET lid (Stndier et aL, Gene Expression Technology: Methods 
m Enzymology 185:60-89 (1990)). 

Recombinant protein expression can be rnaxirnized in host bacteria by providing a genetic 
background wherein the host cell has an impaired capacity to proteolytically cleave foe recombinant 
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protein. (Gottesman, S., Gene Expression Technology: Methods m Enzymology 1 85, Academic 
Press, San Diego, California (1990) 1 19-128). Alternatively, the sequence of the nucleic acid 
molecule of interest can be altered to provide preferential codon usage for a specific host cell, for 
example E. coli. (Wada et al. Nucleic Acids Res. 20:2111-2118 (1992)). 

The nucleic acid molecules can also be expressed by expression vectors that are operative in 
yeast Examples of vectors for expression in yeast e.g., S. cerevisiae include pYepSecl (Baldari, et 
al, EMBO J. 5:229-234 (1987)), pMFa (Kurjan et al., Cell 30:933-943(1 982)), pJRY88 (Schulfe et 
al., Gene 54:113-123 (1987)), andpYES2 (Invitrogen Corporation, San Diego, CA). 

The nucleic acid molecules can also be expressed in insect cells using, for example, 
baculovirus expression vectors. Baculovirus vectors available for expression of proteins in'cultured 
insect cells (e.g., Sf 9 cells) include the pAc series (Smith et al., Mol. Cell Biol 3:2156-2165 
(1983)) and the pVL series (Lucklow et al, Virology 770:31-39 (1989)). 

In certain embodiments of the invention, the nucleic acid molecules described herein are 
expressed in mammalian cells using mammaUan expression vectors. Examples of mammalian 
expression vectors include pCDMS (Seed, B. Nature 32P:840(1987)) and pMT2PC (Kaufman et al., 
EMBO J. tf:187-195(1987)). 

The expression vectors listed herein are provided by way of example only of the well- 
known vectors available to those of ordinary skill in the art that would be useful to express the 
nucleic acid molecules. The person of ordinary skill in the art would be aware of other vectors 
suitable for maintenance propagation or expression of the nucleic acid molecules described herein. 
These are found for example in Sambrook, J., Fritsh, E. R, and Manians, T. Molecular Cloning: A 
Laboratory Manual 2nd, ed, Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, NY, 1989. 

The invention also encompasses vectors in which the nucleic acid sequences described 
herein are cloned into the vector in reverse orientation, but operably linked to a regulatory sequence 
that permits transcription of antisense RNA. Thus, an antisense transcript can be produced to au, or 
to a portion, of the nucleic acid molecule sequences described herein, including both coding and 
non-coding regions. Expression of this antisense RNA is subject to each of the parameters 
described above in relation to expression of the sense RNA (regulatory sequences, constitutive or 
inducible expression, tissue-specific expression). 

The invention also relates to recombinant host cells containing the vectors described herein. 
Host cells therefore include prokaryotic cells, lower eukaryotic cells such as yeast, other eukaryotic 
cells such as insect cells, and higher eukaryotic cells such as mammalian cells. 
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The recombinant host cells are prepared by introducing the vector constructs described 
herein into the cells by techniques readily available to the person of ordinary skill in the art These 
include, but are not limited to, calcium phosphate transfection, DEAEdexlran-mediated 
transfection, canonic hpid-mediated transfection, electreporation, transduction, infection, 
5 Iipofection, and other techniques such as those found in Sambrook, et al. (Molecular Cloning: A 
laboratory Manual 2nd. ed. Cold Spring Harbor Laboratory. Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, NY, 1989). 

Host cells can contain more than one vector. Thus, different nucleotide sequences can be 
introduced on different vectors of the same cell. Similarly, the nucleic acid molecules can be 
1 0 introduced either alone or with other nucleic acid molecules that are not related to the nucleic acid 
molecules such as those providing trans-acting factors for express™ vectors. When more than one 
vector is introduced into a cell, the vectors can be introduced independently, co-nrtroduced or joined 
to the nucleic acid molecule vector. 

In the case of bacteriophage and viral vectors, these can be introduced into cells as packaged 
or encapsulated virus by standard procedures for infection and transduction. Viral vectors can be 
rephcation^ompetent or rephcauon^efective. In the case in which viral replication is defective, 
replication will occur in host cells providing functions that complement the defects. 

Vectors generally include selectable markers that enable the selection of the subpopulation 
of cells that contain the recombmant vector constructs. The marker can be contained in the same 
vector that contains the nucleic acid molecules described herein or may be on a separate vector 
Markers include tetracycline or ampicillin-resistance genes for prokaryotic host cells and 
dfoydrefblate reductase or neomycin resistance for eukaryotic host cells. However, any marker that 
provides selection for a phenotypic trait will be effective. 

While the mature proteins can be produced in bacteria, yeast, mammalian cells, and other 
cells under the control of the appropriate regulatory sequences, cell- tree transcription and 
translation systems can also be used to produce these protems using RNA derived from the DNA 
constructs described herein. 

Where secretion of the peptide is desired, which is difficult to achieve with multi- 
transmembrane domain containing proteins such as proteases, appropriate secretion signals are 

incorporated into the vector. The signal sequence can be endogenous to the peptides or 
heterologous to these peptides. 

Where the peptide is not secreted into the medium, which is typically the case with 
proteases, foe protein can be isolated from the host cell by standard disruption procedures, including 



49 



WO 02/26947 



0 



PCT/USOl/29960 



freeze thaw, sonication, mechanical disruption, use oflysing agents and the like. The peptide can 
then be recovered and purified by well-known purification methods including ammonium sulfate 
precipitation, acid extraction, anion or canonic exchange chromatography, phosphocellulose 
chromatography, hydrophobio-interaction chromatography, affinity chromatography, 
hydroxylapatite chromatography, lectin chromatography, or high performance hquid 
chromatography. 

It is also understood that depending upon the host cell in recombinant production of the 
peptides described herein, the peptides can have various glycosylate patterns, depending upon the 
celL or maybe non-glycosylated as when produced in bacteria. In addition, the peptides may 
include an initial modified methionine in some cases as a result of a host-mediated process. 



Uses of ve ctors and host cells 

The recombinant host cells expressing the peptides described herein have a variety of uses. 
First, the cells are useful for producing a protease protein or peptide that can be further purified to 
produce desired amounts of protease protein or fragments. Thus, host cells containing expression 
vectors are useful for peptide production. 

Host cells are also useful for conducting cell-based assays involving the protease protein or 
protease protein fragments, such as those described above as well as other formats known in the art 
Thus, a recombinant host cell expressing a native protease protein is useful for assaying compounds 
that stimulate or inhibit protease protein function. 

Host cells are also useful for identifying protease protein mutants in which these functions 
areaffected If the mutants naturally occur and give rise to a pathology, host cells containing the 
mutations are useful to assay compounds that have a desired effect on the mutant protease protein 
(for example, stimulating or inhibiting function) which may not be indicated by their effect on the 
native protease protein. 

Genetically engineered host cells can be further used to produce non-human transgenic 
animals. A transgenic animal is preferably a mammal, for example a rodent, such as a rat or mouse, 
m which one or more of the cells of the animal include a transgene. A transgene is exogenous DNA 
which is integrated into the genome of a cell from which a transgenic animal develops and which 
remains in the genome of the mature animal in one or more cell types or tissues of the transgenic 
animal. These animals are useful for studying the function of a protease protein and identifying and 
evaluating modulators of protease protein activity. Other examples of transgenic animals include 
non-human primates, sheep, dogs, cows, goats, chickens, and amphibians. 
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A transgenic animal can be produced by introducing nucleic acid into the male pronuclei of 
a fertilized oocyte, e.g., by microinjection, retroviral infection, and allowing the oocyte to develop 
in a pseudopregnant female foster animal. Any of the protease protein nucleotide sequences can be 
introduced as a transgene into the genome of a non-human animal, such as a mouse. 

Any of the regulatory or other sequences useful in expression vectors can form part of the 
transgenic sequence. This includes intronic sequences and polyadenylation signals, if not already 
included A tissue-specific regulatory sequences) can be operably linked to the transgene to direct 
expression of the protease protein to particular cells. 

Methods for generating transgenic animals via embryo manipulation and microinjection, 
particularly animals such as mice, have become conventional in the art and are described, for 
example, in U.S. Patent Nos. 4,736,866 and 4,870,009, both by Leder et al., U.S. Patent No. 
4,873,191 by Wagner etaL and in Hogan, B., Manipulating the Mouse Embryo, (Cold Spring 
Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1986). Similar methods are used for 
production of other transgenic animals. A transgenic founder animal can be identified based upon 
the presence of the transgene in its genome and/or expression of transgenic mRNA in tissues or 
cells of the animals. A transgenic founder animal can then be used to breed additional animals 
carrying the transgene. Moreover, transgenic animals carrying a transgene can further be bred to 
other transgenic animals carrying other transgenes. A transgenic animal also includes animals in 
which the entire animal or tissues in the animal have been produced using the homologously 
recombinant host cells described herein. 

In another embodiment, transgenic non-human animals can be produced which contain 
selected systems that allow for regulated expression of the transgene. One example of such a 
system is the creAoxP recombinase system of bacteriophage PI. For a description of the creAoxP 
recombinase system, see, e.g., Lakso et oL PNAS 89:6232-6236 (1992). Another example of a 
recombinase system is the FLP recombinase system of S. cerevisiae (O'Gorman et al. Science 
257:1351-1355 (1991). If a creAoxP recombinase system is used to regulate expression of the 
transgene, animals containing transgenes encoding both the Cre recombinase and a selected protein 
is required. Such animals can be provided through the construction of "double" transgenic animals, 
e.g, by mating two transgenic animals, one containing a transgene encoding a selected protein and ' 
foe other containing a transgene encoding a recombinase. 

Clones of foe non-human transgenic animals described herein can also be produced 
according to foe methods described in Wfonut, Letal Nature 555:810-813 (1997) and PCT 
International Pubhcation Nos. WO 97/07668 and WO 97/07669. In brief, a cell, e.g., a somatic cell, 
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from the transgenic animal can be isolated and induced to exit the growth cycle and enter G„ phase. 
The quiescent cell can then be fused, e.g., through the use of electrical pulses, to an enucleated 
oocyte from an animal of the same species from which the quiescent cell is isolated. The 
reconstructed oocyte is then cultured such that it develops to morula or blastocyst and then 
transferred to pseudopregnant female foster animal. The offspring bom of this female foster animal 
will be a clone of the animal from which the cell, e.g., the somatic cell, is isolated. 

Transgenic animals containing recombinant cells that express the peptides described herein 
are useful to conduct the assays described herein in an « vr*> context Accordingly, the various 
physiological factors that are present in vivo and that could effect substrate binding, protease protein 
activity/activation, and signal transduction, may not be evident from in vitro cell-free or cell-based 
assays. Accordingly, it is useful to provide non-human transgenic animals to assay in vivo protease 
protein function, including substrate interaction, the effect of specific mutant protease proteins on 
protease protein function and substrate interaction, and the effect of chimeric protease proteins. It is 
also possible to assess the effect of null mutations, mat is mutations that substantially or completely 
e liminat e one or more protease protein functions. 

All publications and patents mentioned in the above specification are herein incorporated 
by reference. Various modifications and variations of the described method and system of the 
invention will be apparent to those skilled in the art without departing from the scope and spirit 
of the invention. Although the invention has been described in connection with specific 
preferred embodiments, it should be understood that the invention as claimed should not be 
unduly limited to such specific embodiments. Indeed, various modifications of the above- 
described modes for carrying out the invention which are obvious to those skilled in the field of 
molecular biology or related fields are intended to be within the scope of the following claims. 
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Claims 

That which is claimed is: 



1. An isolated peptide consisting of an amnio acid sequence selected from the group 
consisting of: 

(a) an amino acid sequence shown in SEQ ID NO:2; 

(b) an amino acid sequence of an allelic variant of an amino acid sequence 
shown in SEQ ID NO:2, wherein said allelic variant is encoded by a nucleic acid molecule that 
hybndrzes under stringent conditions to the opposite strand of a nucleic acid molecule shown in 
SEQ ID NOS: lor 3; 

(c) an amino acid sequence of an ortholog of an amino acid sequence shown in 
SEQ ID NO:2, wherein said ortholog is encoded by a nucleic acid molecule that hybridizes under 
stringent conditions to the opposite strand of a nucleic acid molecule shown in SEQ ID NOS - 1 or 3- 
and ' 

(d) a fragment of an amino acid sequence shown in SEQ ID NCh2, wherein said 
ftagment comprises at least 1 0 contiguous amino acids. 

2. An isolated peptide comprising an amino acid sequence selected from the group 
consisting of: 



00 



an amino acid sequence shown in SEQ ID NO:2; 



(b) an amino acid sequence of an allelic variant of an amino acid sequence 
shown in SEQ ID NO:2, wherein said allelic variant is encoded by a nucleic acid molecule that 
hybnmzes under stringent conditions to the opposite strand of a nucleic acid molecule shown in 
SEQIDNOS.l or 3; 

(c) an amino acid sequence of an ortholog of an amino acid sequence shown in 
SEQ ID NO:2, wherein said ortholog is encoded by a nucleic acid molecule that hybridizes under 
strmgent conditions to the opposite strand of a nucleic acid molecule shown in SEQ ID NOS - 1 or 3- 
and ' 

(d) a fragment of an amino acid sequence shown in SEQ ID NO:2, wherein said 
ftagment comprises at least 10 contiguous amino acids. 



3. 



An isolated antibody that selectively binds to a peptide of claim 2. 
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4. An isolated nucleic acid molecule consisting of a nucleotide sequence selected from 
the group consisting of: 

(a) a nucleotide sequence that encodes an amino acid sequence shown in SEO 
ID NO:2; ^ 

(b) a nucleotide sequence that encodes of an allelic variant of an amino acid 
sequence shown in SEQ ID NO:2, wherein said nucleotide sequence hybridizes under stringent 
condrtions to the opposite strand of a nucleic acid molecule shown in SEQ ID NOS: 1 or 3; 

(c) a nucleotide sequence that encodes an ortholog of an amino acid sequence 
shown in SEQ ID NO:2, wherein said nucleotide sequence hybridizes under stringent conditions to 
the opposite strand of a nucleic acid molecule shown in SEQ ID NOS: 1 or 3; 

(d) a nucleotide sequence that encodes a fragment of an amino acid sequence 
S hown in SEQ ID NO:2, wherein said fragment comprises at least 1 0 contiguous amino acids; and 

(e) a nucleotide sequence that is the complement of a nucleotide sequence of 

(aHd). 



5. An isolated nucleic acid molecule comprising a nucleotide sequence selected from 
the group consisting of: 

(a) a nucleotide sequence that encodes an amino acid sequence shown in SEO 
IDNO:2; w 

(b) a nucleotide sequence that encodes of an allelic variant of an amino acid 
sequence shown in SEQ ID NOS, wherein said nucleotide sequence hybridizes under stringent 
conditions to the opposite strand of a nucleic acid molecule shown in SEQ ID NOS. l or 3; 

(c) a nucleotide sequence that encodes an ortholog of an amino acid sequence 
shown in SEQ ID NO:2, wherein said nucleotide sequence hybridizes under stringent conditions to 
the opposite strand of a nucleic acid molecule shown in SEQ ID NOS: 1 or 3; 

(d) a nucleotide sequence that encodes a fragment of an amino acid sequence 
shown in SEQ ID NO:2, wherem saui fragment comprises at least 1 0 contiguous ammo acids; and 

(aHd) * nUCle ° tide 86(11161106 iS "^P 1 ™* ofa nucleotide sequence of 



A gene chip comprising a nucleic acid molecule of claim 5. 



7 



A transgenic non-human animal comprising a nucleic acid molecule of claim 5 
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8. A nucleic acid vector comprising a nucleic acid molecule of claim 5. 

9. A host cell containing the vector of claim 8 . 

10. A method for producing any of the peptides of claim 1 comprising intioducing a 
nucleotide sequence encoding any of the amino acid sequences in (a)-{d) into a host ceu, and 
culturing the host cell under conditions in which the peptides are expressed from the nucleotide 
sequence. 



11. A method for producing any of the peptides of claim 2 comprising introducing a 
nucleotide sequence encoding any of the amino acid sequences in (a)-(d) into a host cell, and 
culturing the host cell under conditions in which the peptides are expressed fiom the nucleotide 
sequence. 



12. A method for detecting the presence of any of the peptides of claim 2 in a sample, 
said method comprising contacting said sample with a detection agent that specifically allows 
detection of the presence of the peptide in the sample and then detecting the presence of the peptide. 

13. A method for detecting the presence of a nucleic acid molecule of claim 5 in a 
sample, said method comprising contacting the sample with an oligonucleotide that hybridizes to 
said nucleic acid molecule under stringent conditions and determining whether the oligonucleotide 
binds to said nucleic acid molecule in the sample. 



14. A method for identifying a modulator of a peptide of claim 2, said method 
comprising contacting said peptide with an agent and detemuning if said agent has modulated the 



function or activity of said peptide. 



15. The method of claim 14, wherein said agent is aommistered to a host cell comprising 
an expression vector mat expresses said peptide. 
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16. A method for identifying an agent that binds to any of the peptides of claim 2, said 
method comprising contacting the peptide with an agent and assaying the contacted mixture to 
determine whether a complex is formed with the agent bound to the peptide. 

17. A pharmaceutical composition comprising an agent identified by the method of 
claim 16 and a phannaceutically acceptable carrier therefor. 

18. A method for treating a disease or condition mediated by a human protease protein, 
said method comprising adnumstering to a patient a phannaceutically effective amount of an agent 
identified by the method of claim 16. 

19. A method for identifying a modulator of the expression of a peptide of claim 2, said 
method comprising contacting a cell expressing said peptide with an agent, and detemuning if said 
agent has modulated the expression of said peptide. 

20. An isolated human protease peptide having an amino acid sequence that shares at 
least 70% homology with an amino acid sequence shown in SEQ ID NO:2. 

21. A peptide according to claim 20 that shares at least 90 percent homology with an 
amino acid sequence shown in SEQ ID NO:2. 

22. An isolated nucleic acid molecule encoding a human protease peptide, said nucleic 
acid molecule sharing at least 80 percent homology with a nucleic acid molecule shown in SEQ ID 
NOS:lor3. 

23. A nucleic acid molecule according to claim 22 that shares at least 90 percent 
homology with a nucleic acid molecule shown in SEQ ID NOS. l or 3. 
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1 CGCCCTTATG CTGAAGCCAT GGATGATTGC CGrnrCTCATT GTGTTCTCCC 
51 TGACAGTGGT GGCAGTGACC ATAGGTCTCC TGGTTCACTT CCTAGTATTT 
Isl ^f ACTA TCAKGCTCC TTTaIaS £ag££S 

»| ^5^™ AATTTCGQAC AAAGCAACAC ATATCAACTT AAGGACTTAC 
^^GAC CGAAAATTTG GTGGATGAGA TATTTATAGA TTCAGCCTGG 
251 AAGAAAAATT ATATCAAGAA CCAAGTAGTC AGACTGACTC CAGAGGAAGA 
301 TGGTGTGAAA GTAGATGTCA TTATGGTGTT CCAGTTCCCC 
351 AAAGGGCAGT AAGAGAGAAG AAAATCCAAA GCATCTTAAA tSgAAG^A 

4oi aggaatttaa gagccttgcc aataaatocc tcatcagttc 
501 gSS ^cagggg agttaactgt ccaagcaagt 
501 gagttgttcc attaaacgtc aacagaatag catctggagt cattgcaccc 

f" A ^? 3GCCT GGCCTTGGCA AGCTTCCCTT CAGTATGATA AO^S™ 
^^ GCC ACCTTGATTA G^AACaCATO GCTTGTCACT GCA^Sct 
651 GCTTCCAGAA GTATAAAAAT CCACATCAAT GGACTGTTAG TTTTOGAA^ 
IS ^^ CC CTCCCTT AAT GAAAAGAAAT GTCAGAAGAT Sa^S 
££££££ SSS^S CAAGAGAGTA CGACATTGCT G^S 
AGTCACCTTT TCGGATGACA TACGCOGGAT TTGTTTGCCA 
^ 2 A ^?' CTG ACCAAATTTG ACTGTCCACA TCACAGGATT 

«, ^ A ^ CTT TACTATGGTG GGGAATCCCA AAATGATCTC £gA^5£ 
951 GAGTGAAAAT CATAAGTGAC GATGTCTGCA AGCAACCACA cwty^SS^ 

AACCTGGAAT GTTCTGTGCC GGATATATGG ^AA^ 
1051 TGATGCCTGC AGGGGTGATT CTGGGGGACC TTTAGTCACA ar^at^^ 
1101 AAGATACGTG GTATCTCATT GGAATTGTAA GCTGGGGAGA 
1151 CAAAAGGACA AGCCTGGAGT CTACACACAA GTCACTTATT £££££££ 

1201 gattgcttca aaaacaggca tctaa (seq id no i, ACCGAAACTG 



5 'UTR: . 7 

Start Codon: 8 

Stop Codon: 1223 

3 ' DTR: 1226 
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£|i;SSKI5-SSS-il DESC1 protein tHoao sapiens] 

Si SliST?2« ' airwa y trypsin- like protease [Homo 

gl|6467958|gb|AAF13253.l|AFl98087_l (AF198087) adrenal secretor 

BLAST to dbHST: 

gi 1 1679749 /dataset-dbest /taxon-9606 

ro»BSSI<W INFORMATION FOR MOOTIATORT TTSB: 
library source : 

Impression informati on from riaot hk^ t hi t . 
Primary cancers " — " 

Human placenta 
Human fetal lung 
Human fetal kidney 
Human fetal heart 
Human fetal brain 
Human bone marrow 
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1 MLKPWMIAVL IVLSLTVVAV TIGLLVHFLV FDQKKBYYHG SFTCELDPQIN 
51 PNFGQSNTYQ LKDLRETTEN LVDEIPIDSA WKKNYIKNQV VRLTPEEDGV 
101 KVDVIMVFQF PSTBQRAVRE KKIOSIUIQK IRNLRALPIN ASSVQVNAMS 
151 SSTGELTVQA SCGKRWFLN VNRIASGVIA PKAAWPWQAS LQYDNIHQCG 
III ^ r,ISNTWLV TAAHCFQKYK NPHQWTVSFG TKINPPLMKR NVRRFXIHSK 
11* y^^l AWQVSSRVT FSDDIRRICL PEASASFQPN LTVHITGFGA 
301 LYYGGESQND LREARVKIIS DDVCKQPQVY GNDIKPGMFC AGYMEGIYDA 

401 Si™ IGIVSWGDNC ^KDKPOWT QVT™^ 

FEATURES : 

Functional domains and key regions: 
Proeite results : 

[1] PDOC00001 PS00001 ASN__GXi YCOS YLAT I ON 
N-glycosylaticm site 

Number of matches: 2 

1 140-143 NASS 

2 290-293 NLTV 

[2] PDOC00005 PS00005 PKC_PHOSPHO_SITE 
Protein kinase C phosphorylation site 

Number of matches: 2 

1 41-43 SFK 

2 266-268 SSR 

131 PDOC00006 PS0O006 CK2_PHOSPHO_SITE 
Casein kinase II phosphorylation site 

Number of matches: 5 

1 94-97 TPEE 

2 152-155 STGE 

3 270-273 TFSD 

4 307-310 SQND 

5 375-378 SWGD 

14] PDOC00007 PS00007 TYR_PHOSPHO_SITE 
Tyrosine kinase phosphorylation site 

362-369 RDLKDTWY 

15] PDOC00008 PS00008 MYRISTYL 
N-myristoylation site 

Number of matches: 3 

1 54-59 GOSNTY 

2 337-342 GMFCAG 

3 346-351 GrXDAC 

16] PDOC00009 PS00009 AMXDATION 
Amidation site 

162-165 CGKR 
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[7J PDOC00016 PS00016 RGD 
Cell attachment sequence 



352-354 RGD 



[8] PDOC00124 PS00134 TRYPSIN_HIS 

Serine proteases, trypsin family, histidine active site 

210-215 VTAAHC 
[9] PDOC00124 PS00135 TRYPSIN_SER 

Serine proteases, trypsin family, serine active site 

349-360 DACRGDSGGPLV 

Membrane spanning structure and domains: 
Helix Begin End Score Certainty 

1 11 31 2.281 Certain 

2 203 223 1.014 Certain 

3 291 311 0.791 Putative 
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BLAST Alignment to Top Hit: 
Alignment to top blast hit: 

>gi|766l55B|ref |NP_054777.1| DBSCl protein (Homo sapiens] 

>gi|6137097|gb|AAF04328.l|AF064819_l (AF064819) serine 
protease DESC1 [Homo sapiens] 
Length =422 

Score = 371 bits (943), Expect » e-102 

F^ame it:L +2 " 176/403 (43%1 ' PositiveB - 267/403 (65%). Gaps ~ 4/403 (0%) 

Query: 14 KPWlIAVIJrVLSLTVVAVTIGL^^ ig 

+PW+I ++I +SL V+AV IGL VH++ ++QKK Y Y+ + ++ pq+ + 

Sbjct: 16 EPWVIGLVIFISLIVIAVCIGLTVHYVRY1JQKKTY1OT 75 

Query: 191 KDIJUSTTE^VDBIPIIDSAWKK^ 370 

+ E4-+V P S ++ ++K+ QfV++ + ++ GV ++++ +p STE +K 
Sb;jct: 76 TEMSQRLESMVKNAFYKSPLRKEF^QVT^ 135 

Query: 371 KI QS I LNQKIRNIiRAIjP - INAS SVQVNAMS S S TGBLTVQAS CG - KRWPLNVN - RIASGV 541 

+Q +IH-+K-M-+ P ++ SV++ +++ + + CG +R Ii+RIG 
Sbjct: 136 rVQIiVLHBKLQDAVGPPKVDPHSVICI KKXKKTEllDSYlitniCCGTRRS¥Tl/^SXiRJ^S^ 195 

Query: 542 IAPKAAWPWQASIOYDNTHQCGATLISNT 721 

+ WPWQASLQ+D H+CGATLI+ TWIiV+AAHCF YKNP +WT SPG IP M 
Sbjct: 196 EVEEG12WP WQASI^WIXSSHRCGAre PGVT I KPSKM 255 

Query: 722 KRNVRRFI IHBKYRSAAREYDIAVVQVSSRVTFSDDIRRI CLPEASASFQPNI/rVHITGF 901 

KR +RR I+H KKY+ + +YDI++ ++SS V +++ + R+CLP+AS PQP + +TGF 
Sbjct: 256 KRGTiRRirVHEK3TKHPSHDYDISL^^ 315 

Query: 902 GALYYGGESQNDIiREAKVKI ISDDVCKQPQVYGNDIKPGKFCAGYMEGI YDACRGDSGGP 1081 

_ ^ G S Q N LR+A+V +1 C +PQ Y + I P M CAG +EG DAC+ GDSGG P 
Sb 3 ct: 316 G^UiKNDGYSQNHI^QAQVTIjIDATTCirePQAYKDAI TPRWLCAGSIiEGKTDACQGDSGGP 375 

Query: 1082 LVTRDLKITrWYIiIGrVSWG^CGQKDKPGVY^ 1222 

L.V+ D +D WYI* GIVSWGD C + +KPGVYT+VT R+WI SKTGI 
Sbjct: 376 LVSSDARDI WYIiAGrvSWGDBCAKPNKPGVY TSKTGI 422 (SEQ ID NO:4) 

Hmmex aearch results (Pf an) t 

Scores for sequence family classification (score includes all domains) - 
Mod el Description ~ * * 

P70008S TrvpsiS f^Ts \ 

Parsed for domains : 

Dotnain fle< ?- f seq-t hmm-f hmm-t score E-value 

PP00089 1/1 ^TtI 39TT: i 2i9~fi 274.8 1.9e-86 



FIGURE 2 

4/30 





WO 02/26947 PCIYUSO 1/29960 



1 TTATATT CAT AAAAGTAGGC AGTAAGTTGA AGATTTATTC ATATAGGATT 
51 TAGTAGCTGC AGCTTTAACC TGTGGCTTCT GTAGCTTTTG TAATCTGGCA 
101 GTGCGCATCT GCTATATTAT CTAAATGTTT CCTCAAAAGG AGAAACACTC 
151 TAACAACTTA TCACCCTAGT CTGCTGGCCA CCATTTTCCC TCAGATGCTC 
201 ACAGCTTCTT CCGTGGGATT TGAAGATATG ACTTCCATGA CACTTGATCA 
251 GTA TGTC AAT GGGTATTGAA CCACTCTTCA GCTCTGATCC CACGGTTCAG 
301 TTCCT TTCA G TGTGACTATG TGTCTTGGTG GTGGGAGATG TGATTCTTTT 
351 ATCTACTTTC TCCATTTATC TTACTCAGAG GAACTGTGCT CTAATAGGGA 
401 AATAGATTGA AAGCTTATAA ATTTCCTTGA GTTTTAACTT TTCTCCTTTG 
451 GTCTTTTTTT CTTTTCAAAT GACTTGAAGA CACATTGATA AGATTCTATG 
501 AGAAAATGAA GAGTTGAACA AATTGAATAT GTATGAGTGA ATGAATAGAT 
551 TAATACATAA ATGATAAATT TATTAAATAA TTTGAACGAA ATCAATCGAG 
601 AGGCACCGAG AATAAATTTG TGTCCTAGAA GTAAGAAGAC CTGAGTTTGA 
651 GATAACTAGT AGTTCTATTA TACTGGAGAA ATTACTTAAT CATCACTGGA 
701 CTTCATTTTT CTCATATGGA AAGTAATTCA ATCACACTAA ACAATCTTTA 
751 AGGTCTCCTT CACTTATAAA TGTATGTTTT AAGCCATTTA GGAGGTTAAA 
801 TAATGTCATG TCCCATGGGA CTTCTGTTTG TTGTTCTATT CAAGCATGTT 
851 AGCTTGTTTC TATCACAGGA CCTGCTGCCT TTCCGCAGCC AGTTCTCTAG 
901 ATTATTTTTA ATCAGTCGGT GCACACATGG TCAATATTTA CTCAATAGAA 
951 TTCAGGTTTC CCAAATTCCA TGAGGATTCT TGATTAATTT TATTACTTAT 
1001 GCCAAAACTA TTATCTTCTT AACTATTTTA GGTCCAAACA GTTTTAACTT 
1051 TTATCCTGGC ATTTATATAT AAAAAACTTT TGTAAGACCG GGTGCAGTGG 
1101 CTCATGCCTG TAATCCCAGC ACTTTGGGAG GCCGAGGTGG GTGGATCACC 
1151 AGGTCAGGAG ATGGAGACCA TCCTGGCTAA CACCATGAAA CCCTGTTTCT 
1201 ACTAAAAATA CAAAAAATTA GCGGGGCGTG GTGGTGGACG CCTTTAGTCC 
1251 CAGCTATTCA GGAGGCTGAG GCAGGAGAAT GGCGTGAACC TGGGAGGCAG 
1301 AGCTTGCAGT GAGCAGAGAT CACACCACTG CACTCCAGCC TGGCAGCCTG 
1351 GATGACACAG CGAGACTCCG TCTCAAAAAA AAAAAAAAAA AAAAGAAAAA 
1401 AACTGTTTTA TAGTCAAAAG AAAAACTTTC TATAAATCAA CCAATCCTGT 
1451 GAAGAAAATA TGAAAAATAT CCTCTGTTTC CAAAAAAATT TAGGCTATCA 
1501 ATATATACAC ATAAAGAGAT AAACTCTGAT AAATTGGATA AATAAAATTC 
1551 ACTATAATAG CAAGTTTTAG AGAACAAGCA CGGGAGTTAG TCGACCTGGG 
1601 CCCTTAAACA GATATCCTCT CTCTCATCCT GTGTTATTTC CTGTGTAATG 
1651 TTGGTATCAT TCCTGCCTGA CTCTCATAGA TTTATATGAT TCCTACTCTG 
1701 TCCAGGTGCC TTATTGGGTC TTAGCGGTAA AAAGATGAAC AAGGCTAATG 
1751 CAGCCCATTG AGAAGCTATC TGTAAGTGAA CATACATGCA AACTAATACT 
1801 TGATTCAATG TGAGAA GCAC TGTTGCTGAT CATAGGTGCC AGAAGAACAG 
1851 CAAAGAGTTA TTTTTTCCTC CAAAATTGTG GAAAAATTTT TATCCCCGGT 
1901 GTGATGCAAT AXAAAATACA CAGCACCACC TTTGAAGTAT TCTTGCCAAA 
1951 TGAATTTAAC CAAAATCTAA TCAAGACTTC AGAGCTAAAG AAAATCTAAA 
2001 GGTAATCCAA TTTATAGGAA ATGAGGGATA TAAAAGAACA AGTTAAATAA 
2051 TACCACAGGA AAGCATTCAG ACAAGTCCAG AAAGTAAGAT ATTCTAAAGG 
2101 ATGTTTAGCT TGATCTCTTC AACAGTCAAT GTCATTAAAA ACTAAAAAAG 
2151 AAGCAGGACT CTTTTAGATT AAAAGAGATT AAAAAGGCAT AACAAACAAG 
2201 TGCACTGCAT GGTCCTCGAT TATGTCTTGG CTTTTACAAA TCATGTGTAA 
2251 TTATAATGAA ACCATGGAGG GAACTTGAAG ATGGACTGGG TATTAGATGA 
2301 TATGGCAGAA ATATCATTAA TTTTTTAGGA GTGTTAAGAG TATCATGGTT 
2351 ATGTTGGATA TATCCTAATT GTCTATAATA ATGATTTGGT AAAAAGTCAC 
2401 GATGTTTTAT TTCACATTAA AATATAGCAG CAGAAAAAAT AAATGAGCCA 
2451 AATACAGTAA AATTTTCAAC AATTGATATA ATAATGTGAT ATATATATGG 
2501 ATGTTCAATT ATACTATTCT TAGTAATTTT TTATGTCTGA ACATTTTCAT 
2551 AATACTTAAA AATAAAAGAT AAAAGATAAA AATAAATGAG ATAATAGATT 
2601 TAAAATCACT TTGTAAACTC TAAAAGGATA GACAGATAAA AGAGATAACA 
2651 AAGTGCTGGA GAAAGGAGGA ATGGTCCCTT TTCAAGCATG TATGCCACCT 
2701 TQGACCATGC TGCTAAGAGA AACCATTCCT GACCACCACA AAGAGGCCAC 
2751 CAAAT GCCTC TAAAATAGAA AGCAGGAGCA ACATTAGGAT TCCCAGATCC 
2801 TGATATTTTT TTTTTAA CAC ATCTTCTCAG ACCAAGATGA CATTGAACAA 
2851 AATTAAAGAC CTTTTTGCAG GGAAAGGTAG GCTACAGCAA CTTGAACTTG 
2901 TCTAAGGAGA GCTGGAAAAC CTGCAAGCAT TGCTATCTGA GAGTAACCAG 
2951 TGGGCCCTTC CTTTTCTCAG GACAGTGGGA TTTGGCACCC GAAGCAGAAA 
3001 TGCTGAAGCC ATGGATGATT GCCGTTCTCA TTGTGTTGTC CCTGACAGTG 
3051 GTGGCAGTGA CCATAGGTCT CCTGGTTCAC TTCCTAGTAT TTGGTAGGTA 
3101 AAATTAAAGA TTTCACTCTA TTTGATTTTA TTTTTCTGCA AAGCTCCATT 
3151 TACATATATG TAAATGTAAC TTCATCTAAA AAATTGCACA TTTACCTTCA 
3201 AATTTCCACA GAGTATATTT AACTGTTTCA GTCATTTCAT CAACAAACAA 
3251 GTACTAAATT CTTATTATAT GTGAGTACTT TTCTGGATAT TCAAGATACA 
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3301 GCTTTAAGCA AAGTAGACAG ATTTCTAATT TCCTTAGAGC TCTCAACCCA 
3351 GAATTCTTTT GAGAATCTAC ACAAAAAGAT CAAAAATTGT AATTGTCTGA 
3401 AACTTACTAG TAATTATAAT AAACAACTCA TCACTTATTA TATATTAAAA 
3451 TGAAAAGCTA TGATAAATTA GTTATTAAAA TTGGCTCTTT TACTCATGAA 
3501 CCATCATTTT CTGTCCAACA TTTCTAAGGC AAAAGAAAAA CACTTGTCTA 
3551 ATAAAATAAG GAATTTCAAA ATGATTQAAA ACCTATACGT ATGACACAAT 
3601 ATTATCATTT ATTTTTAGAG AAAAAAAATT TTACTCTTTC CAAAACAATA 
3651 TTCAGGGATT ATATTTTTAT CAACTAATAT ATTTGTAATT ACACAAATAA 
3701 TGCACTTCAA GATTCTCTTT TTACATTCAG TCTCTTTCTG GGGAGAATGC 
3751 AAGCCATTTA CATTTTTTCA CAAATCTCTA CAATGTGACT CTCACATGGA 
3801 TGTATGTGAT AAAACAAATA ACTCAGGCTG CTCACTTTAA CGCTCTTATC 
3851 TGCTGTCACC TTCACAGAGT CAATGGGGGA GCAAAGACTC TACTTGGAGC 
3901 CTT AAAGG GC TTAAGATCAT AGTCCTAGGC CTTATATGAT AACCCCAGCT 
3951 GTAGTTTATA CCATTGGCAA AAGATTCTCA GGTCACTTTA TTTGGTTGCA 
4001 TAAAAGTCTC TTTACAATGA GAGTAAGGTT TGTTAACAGT ATGGATTATA 
4051 TGGGTAAGTA ATCAGGATGT CCAAAAATGT ATTACAAGGT CCAGAGATTT 
4101 CCCACTTAAG ACATATGCCT TCCTGATATC CCTGTTTCTT TCCTTGGTTT 
4151 GTA GTCTCGA AACCCACTCC CTCTTCCCTG AGCCAGGCTT CTCAAGGATT 
4201 GAGGTTGTTT TGTATTTTTC CCATTCTCTA TCTTTAACTC TGTATCTTTC 
4251 TTACTC CCTC TGGGCCTTAC TCCTCAGATT ACCAAATTCC TTAGGAGTCT 
4301 CAACTGCTTT CCTTTCTTAC ATTTCCTAAT AGATTTATCC CTGTTTCATG 
4351 CTCGTCTTGT CTTCAATCTC AGACAGCTCT TCTCTACACT TTCTTTTCAG 
4401 GXTTTTCTTA GTGTGCCTGG CTCTCTTGTT AAAAATCAAA ATTCACAAGG 
4451 ACATTCACTT ATCTCTACTT CCACTAGAGT GTATGATGGT ACACATTTCA 
4501 ACTCAGCAAG GAGCAATGTA GCAATGAAAT GTTCAAGCTC TACAGCTAGA 
4551 CTGGATTTAA AACTTGGACA GGCCACCTAC TAGTTACAGA ACAATTTACT 
4601 TAATGCCTCT GTGCCTTAAT TTCCTTATCT GTAAAATGAA GGTGATACCA 
4651 ATCTTAGAGA GCTGGTGTGG GGATTAAATG GGCTAATACA TAAAAAGTGC 
4701 ACAGGACAGT GCCTGCCATA TTGTAGAAAC TCAATAAATG GCAGCTATTA 
4751 TAATTGATAT AAAACATTAA CTGTTATTTT TTAAATAAAA CTCAATTATG 
4801 AAGAGGCTCA GGGACATATT CAAGATTTAT- ATTGGCCCCA TTGTAATTGA 
4851 GTTCTGAAAT CTTTGTCCAA ACCATTTAGT TTCCTATTTT TCATTTCCAT 
4901 TGCAGACCAA AAAAAGGAGT ACTATCATGG CTCCTTTAAA ATTTTAGATC 
4951 CACAAATCAA TAACAATTTC GGACAAAGCA ACACATATCA ACTTAAGGAC 
5001 TTACGAGAGA CGACCGAAAA TTTGGTGAGT CAGGTAAACT T CTTTTT ATC 
5051 ATAGAATAAT GCAAGTGGAA GGGATTTTGT GGATCATTTC TCCATTTCTA 
5101 AAAACATGAT TTTCAGACCG CCAACATTAG AATCATCTTG CAOATTGCTA 
5151 GGCCCCATCC CAGACCTGCT TAATCAGAGT ATGATGAGAT GGGTAGGTGG 
5201 GGAGAGGAGA GTAAGGGAAT CTGCATGTCT AACAAATGGG TGATTCTAAT 
5251 AAGCCTCTCr TTCTAACTCA GCTACCTTAT TTAAAGGTAA GAGAATTGAG 
5301 GCCAAGATAT CCTAGCCCGT TTCTTCCCCA ATTCCACCAC GTTTCCCCTG 
5351 TAGAAAAGCC TAATCATACC AAAACTAGTT TTTATAAGTC CACACACTTG 
5401 ^^^ AAGAC CACATTTTAA GATTTTGAGT ATTTTCAGAA TTTACGTTCA 
5451 TCTTGTAAGT ATATTGATAA AGACAAAAAA CCAGACTTAT TTTGTAGTAA 
5501 TCAAGTCAAA TGCTAATAAT TTTGTTAAAG CTAAAGTGCA AGACTGCTCC 
5551 CAAAAAGAAA AAAAGCACAC TCAGTTGTAT AATCATTCCA CTCAGAATGC 
5601 CCATGAACTC TCACTCAAAA ACTAGGTTCA AATTAATTTT TCTAACAAGG 
5651 AAGCACAGAA GCAGAGACTT ATTTTAAAAA GAAAGAAATG ACAAATGTAT 
5701 TGGTTTGTTT TAATCAAAGA ACCATTTTTA AGACACTTTC TTTCCCAAAT 
5751 CATCTACCAT TTTTTCCTGT CATCATTTGC TCTTTGTCCA TAGTATACCT 
5801 AATGGCATCA TATTTACAAT AATATTGTAG AGTTTATAAT CTCTATTTTC 
5851 AGTTAACATT AAATCATTCA CAATTTCTTA ATTTTGTGGT TTCATCTTTC 
5901 CCAACCAATA ATTAATGTCT ACAGATTGAT ATAGATTCTG CATTCTTTCA 
5951 CATGCAGAGC ATCTTATAAA AGAGCATTTG CAATCAGTTC TTAAGTTATG 
6001 CTAGGATGAA CGGGGAGCCT GCACCAATAC ACCCAAATAC CTTCTCTACT 
6051 CCTCCAGTCC TAAGTGACTC CACATAACCT CCTCGATGCA AAAAGAGAAA 
6101 ACTCTTAACT TGCCTTAGTT AAAAAGATAA ACACACCTTT GAATGATGGA 
6151 AAATGTTACA ATTTACTGGG AAATTTTGAA ATTTGTTTCA TTTATATTTT 

6201 ^^? CCAACA TTACTGCTAC TGTTGTTGTT GTAAGTTAAC TAGGCAATTC 
6251 TGTCTTTACT GAAGTAAACG GACAAGAATG CAATAGGTCT TAAAAGAAGT 
6301 GAGAGAAATG CAGAGGTGCA TGTTGAACAG AAACTCTATT TAAAAGTGGA 
6351 GTTTTAAGTT TCACCTAAGC ATGTGTTCCT TCAAAGGCTA AGGCTAAGTT 
6401 AAGTAAGGAC ACATTATCAT CATGGGTACC TGCAAGGCCC TTCTCTGGTT 
6451 GTCATTATTT ATTTATCCTC CTTTATCACC ATAGCATAAG CCCTTACCCT 
6501 CCCCCCTTGC AGGAAATCAT TCTATGTTTC ATGTGGTATT CTTTTGTTTG 
6551 TATTCATTCT TACAAAAATA TGTTTTGCTA TTTTGCQTAC ACTTGCTTTT 
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6601 AACTTACATT TTGTGTTATA AATCACTTTT GTTTCATCTC TTTTTACTGA 
6651 GAACTTTTTA AAAGATATAT GTTACTAAAT ATACCTTTAG TTTATTGCTG 
6701 TTAGCTGCTA ATTCATAGTG TGTATCTTCC ATATTTACCT GCCTGTCATG 
6751 CCAAGAAATG CCACACTAAA CAGACTCCTA CTTACCCCCT TATAGACCTA 
6801 TGCAAGTACT TCTGGAAGCA GAATTACTAG GTCATTGAAT GTACATATAC 
6851 TTAACTTGAC CAATTGGTGC AGGTTTGCTC TTCAAAATGG CTGACTCAGT 
6901 GTGCACX3CCC AT CTACAATG CATGAGGATT TCTATGTCCC CACATCTAAC 
6951 CAACACTTAG TGTCTTAGTA TGTTTAGGCT ACTACAACAA AAAATACCAT 
7001 AGGCTGGGTA TCTTAAACAA CAAACAATTA TTTCTCATAG TTCTGGAGGC 
7051 TGAAGATTCC AAGATGAAGA TGATCAAGGC TCTAGCAGAT GTCTGGTGAG 
7101 AGCCTGCTTC CTGGTTCATA GAATACCATC TTGCTGTGTC CCTCATGGCA 
7151 GAAGCCATAA GAGAACTTTC TTTTGTAAGG ACACTAATGA CTTTCATGAG 
7201 AACTCCACCC TCATGACCTA ACTATCCTCC AAAGGCCCCA TCTCCTCTAT 
7251 CATCGGTTTG GGAGTTAAGG TCTCAAAATA TAAATTTCAG GGGAACACAA 
73 01 ACATTCAGTC CACAGCACTT GGTATTATTT GGCTTTCTAA ATTTGCCACC 
7351 CTAATATGTA TAAAGTAGTA TTTTATTTGT GATTTAATTT GCATGTTTCT 
7401 AATTACTAAT GAGTTTGTGC ATTGTTACGT ATAATTATTA ACTTTTTGGA 
7451 CTTTCATTTC TATAAATTGC CTGTACATAT TATTTGCCTA TTTTTCTGTT 
7501 AAACTTGCTT TTTCACCTTA TTTGTATTGC TTTGCAGAAG TTCTTTACAT 
7551 TTTCTGGATA TTGATAGTGT GTTGGTTGTG GACACTGCGC TTATCCATTC 
7601 TGTCTTCTAC TAATATGGAC CGTGTTGTTC TTTATGAAAC CGAAATCTGT 
7651 A ACTGAAGTA ATCATTTTTT CACTGTTTTG CCTXATGATT GTATTTTGAA 
7701 GCTTTTCTTT AAGAAGTCCT TCTTCCCTTC TAAGACATAA AAATATTTTA 
7751 CTATGTTACT TATTAACCTT ATAGTTTTAT CTTTTACATT AGGTCTCCAA 
7801 TACATGTGGA ATCCACCTTT GGATGTGTTA GGTAGATTCA GTTTTTTAAT 
7851 TCATATAGTG AGCCAGTTTT TGAATATAAC TAGTTAAAAT ATCTTGGCTT 
7901 TTCCTAATAT ATGGTATTAT TATTGAGTTC ATTGCATGCA TTTCTTGGCA 
7951 CCTGGGTCTT GCAGAAAAGG AAACATGAAT CTGTCTCCTC AAATTGCTTC 
8001 CAATCTTTTT GGAAAGATGT GAGTAACACA CATGGAATTG AATATC3LTGA 
8051 CATGATATAA TTAAGGGCTA AATTACATGT TGAGGACAGT AAGTACAGAA 
8101 AAACTTCAAA A CCAA ACAAG GGTTCCCATG GTCAGAAAAG GCTTTATATT 
8151 ATTTTACCTT TGTTTAAATG AGACAGGTGT TTTTCTCCTC CCATCCCGCA 
8201 CCAGGTTAGC TTTAGAAGAA TTACAGGAAG AGTTTATGCC TCATCCTGAG 
8251 CCACACCTGT TTGTTGTTGC TAAATCCCAA TGAATACAAC CAGATTCTTC 
8301 TCTCTGTCCT ATATGGGTGC TAATTAGACA ACCAAGGAAG AACAGGTTGC 
8351 ACGTCCTGTT CTTCCTCACA TTGGGCTTTA CTGATTTGAA TGCAAATTGA 
8401 GATGCAAAAG TAAAAATGAG TTCATATTTA GATATTGCTA TAATCCGCCC 
8451 CTGTTCCCTG AGATAGTGGA GCAGACATAT CTCATCTCTC AXATCATTCT 
8501 TCAGAGAAGG GTCCATTAAT CAGACATTAC TGATGTCTGA TTACTGCCGG 
8551 CTGGCCATCC TGCAGGTGGA GAAGCATGGC ATCCAGCAGA AACTGACAGC 
8601 ATGCACTTTG AGGGAGGGAA GGATAAGCCA GGAATTTATG CTGAATAAGC 
8651 TGCCTAAGTA TACATGTTCA ATAAGTTCTA GGGGAAGTCA CAAATACTTA 
8701 TGAAAGGAGA AACATAACTA TGTGCAATTG AGCTTTATGT CTCTTCATGT 
8751 GTTGCATGTT CAAAAAATGG TGGCATTAGC ATGATCCAAG GGTGGAGTTT 
8801 TCAGCCATTT GATGTTCAAA GGTGAAGCAG AGGACACAAA ACCCTTACTA 
8851 TGCATCCTCT GTGAGTCAGC CAAAACCAGT CTGGACTGCT AGCTAGATTA 
8901 ACAAAGAAAA AAAGAGAAAG AAGATACAAA TAAGCACGAT CAGAAATGAT 
8951 AGAGGTAACA TTACAACCAA TCCCACAGAA ATACAAAAGA TCGTCTGAGA 
9001 CTCTTATGAA CACTTCTATG TAGATAAACT AGAAAATCTA GAGGAAATGG 
9051 GTAAATTCCT GGAAAAACAC AATCTTCCAA GATTGAATCA GAAAGAAATT 
• 9101 GAAACCCTGA ACAGACCAAT ATTGAGTTCA TACTTAAATC AGTAATTTAA 
9151 AAAACTTACC AGCCAAAAGG AAAAAAAAAG GCCCAAACTA GATGGATTCA 
9201 CAGCCAAATT CTACCAGACG TACAAGAAAT AGCTAGGACC AATTCTAGTG 
9251 AAACTATTCC AAAGAATTGA GAAGAGACTT CTTCTTAAAT CATTCTATGA 
9301 AGTCAGCATT ACCCTAACGC CAAAACCTCA CAAAGACAGA ATGAAAAAAG 
9351 AAAATTACAG GCCAATATCC CTGATGAACA TAGATATAAA AATCCTCAAC 
9401 CAAATACCAG CA AACCA AAT CCAGCAGCAC ATCAAAAAGT TAATTTTCCA 
9451 AAATCAAGTA GGCTTTATTT CTGTGATGCA AGACTGGTTC AACATATGTA 
9501 AATCAATAAA TGCGATTTAC CACATAAACC GAATTAAAAA CAAAAATCAT 
9551 ACAATTAGCC AGGCATGGTG GCTCACACTT GTAATCCCAG CACTTTGGGA 
9601 GACCATGGTG GGCAAATTAC CTGAGGTCAG AAGTTCGAGA CCAACCTGGC 
9651 CAACATGGTG AAACCCCATC TGTATTAAAA ATACGAAAAT TAGCCGGGCA 
9701 TGGTGGCAGG TGCCTGTAAT CCCAGCTACT CGGAGGGCTG AGGCAGGAGA 
9751 ATCACTTGAA CCCAGGAGGC AGAGGTTGCA GTGAGCCGAG ATCX3TGCCAT 
9801 TGCACTCCAG CCTGGGTGAC AGAGCAAAAA TCCATCTCAA AAAAATTAAA 
9851 AATTTAAGAA AATTAAAATC ATACAATCAT CTCAATATAT GTAGAAAAAG 
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9901 CTTTTGATAA AATTAAACAT CCCTTCATAA TAAAAACACT TAGACTAGGC 
9951 ATCGAAGAAA CATACTTCAA AATAATAAGA GCCATCTGTG ACAAACCCAC 
10001 AGCCATCATC ACACTGAATG GGCAAAAGCT GGAGGCACTA TCCTTAAGAA 
10051 CAGGGAAAAA GACAAGAATG TTCACTCTCA CTACTCCTAT TCAACATAGT 
10101 ACTAGAAGTT CTAGAAAGAG CAATCGAGCA GGAGAAAGAA GGAAAATGCA 
10151 TCCAAATACG AAAAGAGGAA GTCAAATTAT CTCTCTTTAC TGACAATATG 
10201 ATTATATGCC TAGAAAACCC TAAAGACTTT ACAAAAAGTT TCCAAAACTG 
10251 ATAAACAACT TCAGTAAAGT TTCAGGATAC AAAATCAATG TACAAAATTC 
10301 AGTAGCATTT CTAAACAATA ATGTCCAAGC TGAGAACCAA ATCAAGAACA 
10351 CAATCC CATT TTCAATAGCG ACACACACAC ACAAATGAAA TACCTAGGAA 
104 01 TACATCTAAC CAAGGAGGTA AAAGATCTCT ATAAGGAGAA TAAAAAAACA 
10451 CTATTGAAAG AAATCGGAGA TGACACAAAT GAATGCAAAA ACATTCCATG 
10501 CTCATGGATT GGAAGAATCA ATATTGTTAA AATGTCC CTA CTGCCCAGAG 
10551 CAATCTACAG ATTCAATGCT ATTC CTATCA AACTACCAAC ATAATTTTCC 
10601 ACACAAAGTT AGAAAAAGCT TTTGTAAATT TCATATGGTA CAAAAAAAAA 
106S1 AAGCCCCAAT AGCCAAAGGA CTCCTAATAA AAAAGAACAG AGCCAGAGGC 
10701 CTCACATTAT CTGACTTCAA ACTATACTTT AAGGCTACAG TAATCAAAAC 
10751 AGAATGGCAT TGGTCAAAAA CAGACATATA AACCAATAGA ACAGAATAGA 
10801 GAACCCAGAA ATAAAGCCAC ACATCTACAG CCATCAGATA TTCAATAAAA 
10851 TTAACAAAAA TAAGCAATGG GGAGAGAACT TTCTATTCAA TAAATGGTGC 
10901 TGGAATAGCT AGCTAGTCAG AAGCAGAAAA ATGAAATTGG ACTC CTATCA 
10951 CTAAATACAA AAACTAACTC AAGATGCAGT AAAGAATTAA ATGTAAGACC 
11001 ACAAACAATT AATACAAGAA CCCTAGAAGA AAACCTAGGA AATACTGTTG 
11051 TAGACATCAG TCTTGGCACA GAATTTAGGA CTAAGTCCTC AAAAGCAACT 
11101 GCAACAAAAA CAAAAATTGA TAAGTTGGAC CTAATTAAAC TAAAGAACTT 
11151 CTGCACAATA AAAGAAACTA TCAACAGAGT AAACAAACAA CCTACAGACT 
11201 GGGAGAAAAT ATTTGCAAAC TATGCATCTG AAAAGGTCTA ATGTCCAGAA 
11251 TCTGTAAAGA ACTTAAACAA CTCAACAAGC AAAAGAAACC AAGTAACGCC 
11301 ATTAAAAAGT AGGCAAAGAA CATGAACAGA TGCTTCACAA AAGAAGACAT 
11351 ACAACGCAGT CAAGAAACAT ATGAACAAAT GCTCCACATC ACTAATTATC 
11401 CAAGTAATGC AAATCAAAAC TACAGTGAGA TAATATCTCA TACCAGTTAC 
11451 AATGGCTATT ATTAAAGATT AAAAAAATAA CATGCTGATG AGACTGCGGA 
11501 GGAAAGAGAA TGCTTAAATA CTGTTGGAAA CGTAAATGGG TTCAGCCACT 
11551 GTGGAAAGCA GTTTGGAGAC TTCTCAAAGT ACTTAAAATG GAACTACTAT 
11601 TCAACCTAGC AATCCTACTT ACTGGGTGTA TACCCAAAGG AGTATAAACT 
11651 TTTTTCCCAG AAAGACAGCT GCACTCTCAC ATTAATTACC ACAGTATTCA 
11701 CAATAGCAAA GATGTGGAAT CAACCTAGAT ATCCATCAAT GGTGGATTGG 
11751 ACAAAGAAAC TGTGAGATAT ATATGTATAT ATATCTATAT ATACCATGGA 
11801 ATACTATGTA GCCATAAAAA AGGATGAAAT CATGTCCTTT GCAGCAACAT 
11851 GGATGTAACA CCACAAGGAA GGCACTTTTA TCTCCTCTTT ACAGGTAAGA 
11901 GAACCAAGCT TCTGAAATTA AGGTCCATAG CTGGAAAATG ATGGAGGGGA 
11951 GATTTGAAGT CATCTAGGCA ACTCCACACA TGTGCTCTTT CCACTAAATT 
12001 GTTCTACTGT CAGGAAGGGA CTCAGCTAAG ACAGAAGATA AAATTATTAA 
12051 AATCTAAATC AATTCTTCTC TCATTTCATT TTTTAAATCC ATGAAGATTA 
12101 TAAATCCTCT ATGCTGTGCT AGCTAACTTT TTCTTGACAG ATACATTAGG 
12151 TATACTTATT AGAGAAAAAT ATTCTCTTTC TCATTTCCCT GTATCAGTTT 
12201 TTGGTGAGGA AGGCAAAGGT AGGAGGAACT GTAATAGAGA AAGATGAAGG 
12251 AAGCTGATGG ATATATTGAC ATGTCTATGT ACATCTAGTG TGAACAATCT 
12301 ATAGTTGGAA GAAAGGTGTG GATGGGTATG CTTTTTGAGG GAAGTTTTTG 
12351 AGAAAAGAAG TAATATGAAC TATTTCTAAA TTTCCTGATA AAGTTGTAAA 
12401 TACAGCATAG TCTTCACAGG AGAATCTATT TAGTTTATCA TCATCATTCA 
12451 GCAAATACAG CATGATGTTA GGCACTATAA AAGGCTAAGA AAAATGATTC 
12501 TCTCTCTCTC ATAAACTAAT CCAATTTAGA GATTTAGAAG ACAACAAATC 
12551 TGG AGAGGA C ATGAACCTTC TAAATAATGA CCTTCCCTTG CTTTGGGTAT 
12601 CCTGGTTTTA AATATTTTTA GTACAGCTTT AAATAGATCC AAATGAGATA 
12651 TTTTCCTCTT TTACAAAAGC AATTCAAAGA TCTAGGTTTT TGTTGTACAC 
12701 TGAGAATTAA TACTTTTTTC TTTAAAATCC TTAATTGCAA ATCTTTAAAT 
12751 TCTATAAATA TTTTGCCTTG TGATCTCAGA AATATAAGCC AATTTGGGAT 
128 01 ATGGATATCT AATATATTGC TACTTGTTAC ACGTGAGTAG TGACAGATGT 
12851 CTGTCCATTT CTTTCTGACA TTCCACAAAG AAACACTGAA GAAGGACCAG 
12901 TGCAATCAAA GAAATGACTG ATGGCATCAC AAAATATCAC ATCCCATTTG 
12951 ATGATCTGAT TACCTTTTTG TTTAGGGTGA TCAGAAAGTC ACAGTTTCAT 
13001 GGCACCCTCC ACACCCACAC ACCTTGTATG ACACTGGATC CAACTGCTTT 
13051 CTCCAATAGA CACAGCACTT AAAGATGTGG CAGTTAGGCT TGACCCCAAG 
13101 AAGGCCAAAA AGCCTTCTGT GAGCATCACT CAGTGCTCAG GTTGACTAAG 
13151 CTCTATCCAG GCTTGAGAGA ATGGTTCATA GCTGACTTCT TGGATCCAAA 
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13201 AAAAAAAAAA AAACACCTAG AGTTTTATAC AGATATGATA CGAACTTAAA 
13251 AGGACTGCAC TAAAAACTAC CAAGATTATG ATTCTTATTT TTGGAGAGTA 
13301 AAGAAAATAG GCTGCCTTTG GAGAGGGGTG CAACAGTTTC TGATCCTCTT 
13351 ACAAACTGCT TGCTGCCCAT CAGTGGGTAG GAGGTCTTAG TGAGAACCTA 
13401 CCTGCATGCT CATCCTGAGG TAGGCACTGT GAAGGCGTTA ACAGGCTCTG 
13451 AAGCTACATG GCCCTGGTTT CAGTGAACTC TGTGGTGTCA ACTTGGGCAA 
13501 GTCACTTCCT CTTCTATGAA ACGTGAATAA TCATAGTACT CACCTTAGAG 
13551 GGCTGATTTG AAAGCAAATG AGCTCAAACA CAATGACATC TGTGCTTGGT 
13601 GCATATATGG CAGACAACAG TGATTCCCAC XATTATAATT ATTACAGTCT 
13651 TACCAAGGAG GAGCTTTCCA CAAATAATCA ATTACCTAAA ATGTCCAAAA 
13701 ACAGGAAAAA AAAATCTCTT CCGATAATTC ATGTGTAATT TTCTTTTTTC 
13751 TCTAGGAGCA TTGATCTCAA CCTGATGTAA AGCAAGCACT TTAAAAAGTC 
13801 T TATAAA ATT TTCCTGGTAA ATGCAAAACT TTCTGATAAA TAAATTCTCA 
13851 CCTTTTTATC AATTTGTTAA TTCAACAAAA ATATACTACA TACCAACAGC 
13901 ATGCAAAGCA CTATGCTAGA TTTTATAGAC TATGAAAAGA TAAATTGCCA 
13951 TCTCTATGCA TAAAGGGTTT GCCATTTAAT AAAAGAGACT ATATATTTGC 
14001 ATAAATATAT AGTGAATATA TTGCATAAAT ATATAATATA TCTTTACATT 
14051 AAAGAAT AAA AGGTATAAGA GGGATAAGAA AAATTGAGAC AGAGGGAAGA 
14101 CAGGTCAGTT TOAGATTAAC GAATATCCCC AAAGAAGGTA TTATCTGAGA 
14151 TTGGCCTTGA AGGATAGTTG TGATTCAGGA ACACAGAACT TGCAGAATGA 
14201 GAAGGTTGTT ACAGACCAAA GGAACAGCCT GAGAGGGGTG AGTATGCAGG 
14251 AAA ATGAG GG CCATGCCTGA AAGTACTGGT GGTGTTGAAG ATGGAGCCAG 
14301 GCAAGTTGGT CACAGAGGGA GAGGACCTTG AATGTCTAAC ATTGTGGACA 
14351 GAGGCTCAAA GGCTCAAATT CCCTATTTTT ACCTTGAGTT CAATCCTTGT 
14401 GGCAATGAAA CCTCAGTGAA GCTTTATTTA AGGCTAAAAG TGTCTTTTAA 
14451 AAATCCCTCT TATATAATAT CCTTTGCATG TTACTCTTGT TGTAATTAGG 
14501 AGAAAGCAAT AGGATCTAAA GTTTTTTTTC ACAGCATGGT TTTGGTTTCT 
14551 TTAATT CTA A GGAGCTCACC TGGTGTTACG TTGGAAAAAA CAG CTTTTA T 
14601 ATC TCATTTA TATTCCATAT GCCAGTCTGC AGTGACATAT CTATCTGAGG 
14651 TTTACAGTGT TAGCCACAAA ACACTCCCTA AGTGAATACA TTGACTGCTG 
14701 TAAGGGGAGC CAGTCAGGAA GCACCTGCAG AGAAAAGCAG GCAACATGTA 
14751 TAAACA GAGT TAATTCAGGA ATGAAAGCTG AATGGCTGGG CGAGTCTGTT 
14801 TGTTTGAGTT GACAGCCTCT CCCTCACTCT TTCATTAAAT ATCCAACTAA 
14 8 51 CCTTCAATTG CCCTCTTGGA -ACTTAATCTC AGTGTAATTT CCAGCATGTC 
14901 AAAATTATCA AGCAGAAAGA GATACTACCC TGAAAGAGGG TCTTTTGTTC 
14951 AATGCTAGGA GACAAACTCC AACTACAAAA TTCTAGAAAT GCCCTAAAGA 
15001 GAGAGATAGG ATAGATTTAC AAATTGCTAA TGCTATTAGG TTGTATAGAT 
15051 AACAATAGAT TTATAACAAC CTGGCACACA GCTTTAAATA TATAAGTTTC 
15101 TCTGAAACTT CTGGGAACTT GGAATGCCAG AACGTTGGCA AAAAGAATGC 
15151 TTCTAATAAT GAAA GCCATC ATCTGCCATG GAAACAATTT CAGGGTCTTT 
15201 AGAAAGCTA G TTTATACATA AGCTCCATTC TACAATAAAA CTTATGTTCA 
15251 TGTTTTTTCT GATTTTCCTC CTGCTGTAAA TTCATTTTAT CAGAATTCTT 
15301 TTTACCAGTC CCTCTGCCCC ATTTCTCAAA GCGTTGTCCT CAGACTACCT 
15351 GTATCACCTA AAGATTCTAA GGCCTCCTCC GATGTAGTAA ATGAGACTTT 
15401 TCTAGAGAGA GAGTCCTAGA ATTTTATAAA GAAGGATCCT TTTTATTATT 
15451 GTGATCACCA AAGTTACTTC TGCCTAGATT CTTCTCATGT TATTTTTACA 
15501 GCTCC TATCT TCCCAGACAA CCTAACAATT CAAAGATAAA ATTGGTGCTT 
15551 GGTTTAGACA TTCATAGCAG GCACGGTGCC AGATTGATGA TGTCATCCAG 
15601 AGTCAAAAAC TTCATCCAAT GCCTTCACCA AAAAGTTACA AATGGCCAGG 
15651 AATCAAATGT GGTTGAACTT ATTCAGAGGG TAATTACAAA ACAAACTTCT 
1S701 TTAAATACCC AACTGCTATT TGCTTTTTTC CTTCTAAATT GTATCACTTC 
15751 TCTCCCTGTT CCATTTTCTT TGCCTTTTTA TTTTTTGGAA TCCCTCACCT 
15801 CCATACTGAG TAGTAGAGCT GGCTGTGGGT GATOAGAGAG AAATTGTTAT 
15851 AACAAAGTCA CCCTTTCAAA AACATGTCTT CCAAAAGAAT TTTGTTTCTA 
15901 GCAGATAAAC CCCACACCAC CTCAGCTAAA TGGGGCTTTC TTTATTTAAG 
15951 TACCAATAAA GACATATTTT GGATACTAGC AATTTATTTT CCAAATGCTA 
16O01 TCTTT GATC T TAAGTTTAAG GCTATTACCA AATCTATATC TCTACAAGTT 
16051 TTATACTTTA GGTCAATAAA TTACTTGATA ACTTATTACT ATGTGTTCTA 
16101 CAAAAGAAAC CGAAGTAAAA TTTACATCAC ATTTAACAGG GTGGTTGTGT 
16151 GATTGAGTGG GAAOAGGCGG ACCCTAGAGA TAGAAGACTT GGGTTTCAGT 
16201 CCCAGCTTAC TAGTATCTGC GTGATGCCAG GGAAATTCAC ATAATGCCTC 
16251 TGAGTCACAG ATTTCTAACA GGAATGAAGA TACTTCTTCG CAGAATTGTC 
16301 ATTAGAGTTA AAGAAGATAA CAAATAATGT GGTTCCTGAT GAGGTATTTA 
16351 TGAAT TCCT G AGCATGCTAA GGAAGTTATA ACTTGTTTCT TGATCCCTGA 
16401 AACAGCTTTC CCTA TATTTQ TGTGTGTGTG TGTGTGTGTG TTTCAGTCAT 
16451 GCAAGTTGGT TTTTCTTCTC ATTCCTTGAG AATTTAGGAT ATTTTGTGCG 
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16501 CACATTTGGT TCTTCTGTCC AACATGAACT GTAGTACCTT ACCCACATTG 
16551 AGATGACACT ATTTCTACCA AGTGAGTGCT AGGGGATACT GCAAGCCGAA 
16601 TGCCAGGTGT GAGAGACCAC AGCATCACAA TACCGTGGCA GTAGATTAAA 
16651 GCTGTGCATA TGGACTAAAA GCAGTGGCTT TGCTTCTCCT ACCTTGGTGA 
16701 CATAAACTGA GTAACAAATT TGACCTAATA CTGGAATACC AC CTAATTCT 
16751 X muIu!" 1 " CCTCC CTGATTTACC CTAGAGTCCA CAATTGACAA TAATTTAAAA 
16801 ATTTTGGCTC TCTCTTAAAT CCCTAATGCC TCCTCCTTAC ACCTTACAAG 
16851 CAAAGACCTG CAGAGCTAAG ACCTGTAATG CCAGGATGGA GGCTAGAGGA 
16901 CCATCAGCAA TTAACTACCA AAACTTACCC AACATTTTAT ATCTGTTTAA 
16951 CCTTCATAGC CTTATGAGTA GCAGATCAAT ATCTTTGTTT TACAGGTTAG 
17001 AAAACTGAGG CTCAAATTGA TTCAGTAACT TTGCCAAGAT TGCCCAGTTT 
17051 GGGAAAAGTA GTATACGCTC AAATCCAGGA CTGAGGCAGG GT TTT CTTTO 
17101 TCACCACTCA AAGCCTCTCT GAATATCCTA TCTCTGCTCT GTATCTCTCT 
17151 GCTACTCCTT CTATGGTGTT TTAGCAAGAT ATCTTCTACT CCAGAAACCT 
17201 ACTCTAGCAC AGTAGAATTA CTTGGGTAGG TTTTTTAAAA ATATGAGTGC 
17251 CTAGGTCCCC TCTAGACCAA TCGAAACCAA AATTCTTGGA GAGGATCCCT 
17301 GGCATCCATA AATTTTTTTA ATTCATCAAA TGATTCTGTT GCACTGTGAA 
17351 AGCTGAGATC CACCAATTTA AATAATGATG TTAGTTCTGT GAAAAAATTT 
17401 TTGATTGCTT TAACATTTAA TCAAGGATAT ATTCCTATTA TAAAATATAT 
17451 TATTAACACA TAGTTTCTCT CTTGTTGTGT AACAGGTGGA TGAGATATTT 
17501 ATAGATTCAG C CTGGA AGAA AAATTATATC AAGAACCAAG TAGTCAGACT 
17551 GACGTATGTA TGTTTGGGCA AAGGTGGAAT CACAAGACTG GAGGGAAAAG 
17601 GAACAAAGGA GACAGGGACT CTCATGTATT GTATGTCTCC ATGGACTAGG 
17651 CTTTTGGCTA GAATTTTTCA TAAACATTAC CTTTAAAGCA GTCTTGAAGT 
17701 ATAGGGCTGA CCACCGTTTT GTCAACAAAA AGACTAAGAT TCAGGAAGGG 
17751 TAAGAAATAT GTTCAAAGTT CACCAACTGA CAGTTTCCCA AAGTGACAGA 
17801 ACCAGGAATC AAACCCCATT AACTTATTGT GAGGCCTGGA ACCTACCAGA 
17851 ACCCATGACG TGGGGAAAAC CCAGCAGCTT GTCGTTGCAT GCACCAAGTT 
17901 AXATTATGTT GACAATTATA TTATTTCAAC CACGTTAAGC AGGCAAACTT 
17951 GGCTATAAAA TGGGTTCACA AATTTTACCT GTAATGTAAC CGAATGACAT 
18001 AAGGCATGCC TAAACAAAAA GATATTCCTG TTGTAATAAA 1 TTTCTTTCT 
18051 GTCAT GGTGG AGGGGGAAGA CTCATATCAG TTGCAGATAT TGCTCAGAAG 
18101 TTTCAATTGT GTTATTTTGA AAAACTACAT AGCAGAACAC GCATGTCATA 
18151 TACACAAATC CATGAGCCTG TATGACTCAT ATTTCTTAAA GATAAAGAAA 
18201 AATAATATAT TCAGATTTTG ATTTATTTGA AGAAAATAAT TATCCCTTTC 
18251 TCACCAATAG ACTAATAATG CTTTGTTGGC AGGTGTACTC AAAGTTCTCT 
18301 ATGTCTTGAC TGAGTAACTA GTGACTTCCG TAAGGATTTT ATAACATAAA 
18351 T T ^ 3GTA ATT CCTACAATAC TTAGGAGGGA AAAAGCATAT AAATGCTAGA 
18401 ACTTTCTAGA TTTCATGTTT TCTGTTTTCA AATTCTCCTT TACCATATTA 
18451 TTGTAGCAAC ATTATTATAC TCCTGTGAAC TCCTTTGGAT GGTAGCCATC 
18501 ACTATATAAT ACCTGGTAAA AATGTTAATT CCTCAGATTT AAGAAGTAAA 
18551 ATTAGTCATC TGTTTGCCAA TTTGACATAA AATTCTAGTT ATTTAGATCT 
18601 TTATATTCCA GAGCCTAAAT GAACAAAAAT ACATAAATTG TCTCAGAATT 
18651 Awx-x-x-mGC CAAAAGATTC AGGGAGATGG GCCTCTAGAG TTTTTCACAG 
18701 TTTTTTTTTT TTTTGTAAAA AAAAAAAAAA AAAAAAAAAG GAGAGATAAC 
18751 AGATCAAIAT ATATTAGTTT CAAG GTTTTT T GTT T TTTTT TTTAAACAAA 
1B801 AACCTGTAAT TGCTTTTCCT ATTTTAACAG TATTTAAAAG TTTAGTTCCT 
18851 CAGGTAACAQ AACTTGAACC TGTTTATATG ATCAAAGTTC AAGAAATTGG 
18901 GCATGTTTAA TTTGGAGAAG ACTCGGGGAC CACAATATTG TTGTCTTCAA 
18951 ATATTTGGGC TAGAGGAGGA AATTATTTTA TGTATGTTCC AACTCOTAGA 
19001 CCTAAGCCTT ATGGAATGGG AGATATAGGG AGACATAXTT CAACTCAAAA 
19051 TGATGAACTC TTAAAAGCAG AGCTGACCAA AGAGAAACAA GCCTCTTTAG 
19101 AAAAT TAAAC TTACTATCTT TTTAATTACT GCACTGTCAT TAGAGGGCCA 
19151 ATTGTCATGG ACCCTGTAGA AGTGATTCAG GTATCAAATA TACAATTGAT 
19201 TAGCCTAAGA AAACATGAAG GCTTCTTCTA ACTCTCAGAG CTTGTAATTT 
19251 TGATGATGAT TTTTTATATC TGTCATTCCT AGCTGCTGTA ACAATCCTTC 
19301 AAATTAATGG GGGAAATGCA CTGAAAACAT AATGAAAGCT AGAAGAGGGA 
19351 ACATATGAAA TGACCTTGGG TCAGAATGAC ATGAGAGGAT CAGCACTTGA 
19401 CACTCTCAGC AACTGAGGGA TCATTCAGGG GAGGAAGATA CAGGTAAGAC 
19451 TGAAGGACAA TTCCAGGTGT ATTCTTTGAA AATGTACCTT TCTTTTGTGT 
19501 GTCACAGTCC AGAGGAAGAT GGTGTGAAAG TAGATGTCAT TATGGTGTTC 
19551 CAGTTCCCCT CTACTGAACA AAGGGCAGTA AGAGAGAAGA AAATCCAAAG 
19601 CATCTTAAAT CAGAAGATAA GGAATTTAAG AGCCTTGCCA ATAAATGCCT 
19651 CATCAGTTCA AGTTAATGGT AAGGAGGTCC CCTTCTATGT GAT ATGAAGT 
19701 TGTCTATTAG GTCCATGTTT TGACGAATCT CAAATTTATT TGTCATTATT 
19751 TCCATTTCAA ATAATAGCTA GAATTCAGAT GAAAAAATTC AAGTTAAAGA 
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19801 TGTGACATTT CAAGGTGTAT TAGTCTCTAA CGTAAGCATG TCTGAAGTTA 
19851 GTCATCCAGT GGTTTTCCCG ACAGTAATTG ATTGGCACTC ATCCCAAAAT 
19901 ATAGGCAAGC ATTTACAACT AACAGAGAGT TAATCCCACC CAGGCACTGC 
19951 CTCCATGACT AAGCAAGTGA AAATACTAGG GGTTTAGCAA TAATTGTTTT 
20001 TCTGGGTGGG ACCTTCCTAA AACACAAATT CATGTGTTGC CATACTTTTA 
20051 TTGATAGTTT CTATATATGG TGATATACAA TTTTTGTTAG CTTTTTTTrr 
20101 TATGGGCATT TCGGAAAATG GCAAGCCAAC TTTGAAGTTG TTAGAGTCAT 
20151 TTTAC CATTA ATGCTTTAAA AATCACAGTC TAGGAAAACA TCACTGAAAC 
20201 TATGTGTACA TTGTTCCACT TTTCTCTTTT TTTTTGTTCA CCCTTAGCCC 
20251 ATTATACCAT TATCACTTCC CTCAATTAAG GAGAACAAAC CTTTATCAAG 
20301 GTCTATCTCT ATGGCCTTTA CCTTAAGTAA CTAATTTCTT TTTATATTCC 
InH, GCAAA ' rrCAC CTTTATAGAA GTGAAATTCA CACAAAAAGA 

20401 GTTGAGGAAT TCAGTAATTA AAAGGAGCTA AGAATCAAAT TTAAATCTCT 
20451 AATTTCTTAA AAGGCTCCAA TTAAAAAAGG TTTCTATAGT £a1£^ 
20501 TTAAAAATTC TGGCTTTGAT ACTCGTTTCT TGGAAATTCT TCCTTATAGT 
20551 GTCATATTAA AAATTCTAAG GCAGCCAGCT AGAGAGAAAC TTGTTTACCC 
20601 TCGTCCGCTA AGCTGTTTGC ACAGCATCTT CTTCCAACAG ACAACTAtS 
lntn ATTTCTCCTA CAAA TTTCAA TGGATACCAG ACCTAAGTGT TACAGAAGAG 
20701 ATTCAGGGCA AGCGATTTTT ATCAGACATG AAACAGGACA CTCTGCCCTT 
lll 3 ^ OTAAGGGTCT AGCTGACACT TCAAGAGGAA ACCAGATAAG GAAGTAAAAA 
ATGTGAGGTA ATGGAATGGG CAGATGTTTG CTGATGTGAG AACGACTCAG 
llll^ CTACTTAGGG AATAAAGCTG AGGACCTCTC CCAGCCAGAA GGGAGGAACC 
20901 TGACAACTGC TTAATCCATC TTCTTTGTTA GATGGGGAAG CAAATGAATA 
20951 GAAGTTGTGA AACAATGGGC ATTCTGATAA TTTACATGAT GCTTTCTGTG 
« 2« ^ TTTCCAA TAAATAGTTA ATTTGTCAGG AATGTAAAAG CCTGAACTAT 
CTGAAACCAG AGTAAAGCAT AAATTGTTCA TTGGCTGCCT GGTC TTTTTO 
21101 TTTTTTGTAG GCTCAGOTTC TAAACTTCAG CTTA1TTTAA TAATTGTACT 
21151 AAATTAAATG GTAGGATATG CTAATGGAGA ACCTGATTTG AGAGTCACCT 
21201 GAGGCTGGGC ATGGTGGCTC AAGCCTATAA TTCCAGCACT TTGGGAGGCC 
21251 GAGGCGGGTG GATCACCTGA GGTCAGGAGT TCAAGACCAG CCTGGCCAAT 
213Q1 ATGGTGAAAC CCOGTCTCTT CTAAAAATAC AAAATATTAG TCAGGCCTGG 
21351 TGACGGGCAC CTGTAATCCC AGCTACTTGG GAGACTGAGG GGGAAGAAXC 
21401 ACTTGAACCC GGGAGGCGGA GGTTGCAGTG AGCCAAGATC GCGCCACTGC 
21451 ACTCAAGCCT GGGCTTGACA GAGCAAGACT CCATCTCCAA AAAAATAAAA 
™J TTACCTGACC AATTCTAACT CCACTAAGTC ACCACAGGAC 

21551 CACCCAAATA ATTGGCTCAT GCCTTTGTCT TCATTTTCTC ATCTGTAAAA 
21601 TTCCAATGGT AATGTTTGTT CTTCCTGAAA TCACAGAGAG ATTATAACGA 
l X *l, l^^H^ AATAGAAAAC ACAATGTGAA ATAAAGAGGC TGTTACTAAT 
IVnll 2^^™ * rcA:K 5TTGTG CATATGCTTT GGAAACCTGA AATCATTAAT 
21751 TTGAGTGATT GACTAGTAGC AGAAAGATAG ATCCTTGAAA GTTTCAGAAT 
III 0 * GAAAGAACAG TGTTTGTTAG TGATATGGGA GCCTAGGGGG 

21851 TGTTGCTTTT CTGGCCAGAA ACCTCTGTGG CCAGTGGTTG GTGCCTTTGC 
1 X * 0X ^SZ™ CTCTGGCCC ^ CTGGGCTTCT TCTCCCCACT TGACCTGGCA 
21951 GACTGTGCCC ACCTTCCGCT ACCAGCCTGG ATCCCATGCC CACCAAGGCC 

lllll TGGAGCTGTG AGGGTTGTCT GAGCGAGCAC AGGGTCTGGC 

II 0 * 2 ; AGCCAGGCAC ACTGGCTGCA GCATGACGGG CAGCTCCAGG 

22101 CACTGGCACA GGTGTGCTGT CTCTCTGTGA GGCTGTGGCT GGACAAAGCT 
III** ^^2°° MCTTCCCTO GCAGGCACCT GGGAATGTGG TGGCACCCAG 
111 0 * 2^™^ GATGCCAGGA ACTGCAGGGT CCCAAAGAGG GAGTCACAAC 
22251 CCTGGCTTGG GGAGCTCCCA GGTCTGGGAT CCCTAAAGGG CTGCAGCTTT 
lllll J^CAAT GTGGCCAGCA AGGGGTATGT 

22351 TTTGTGTTAC AGCTCTTTTA GTCTTGCTAT TTGGCAGGTC CTGAGTTCTT 
IIVS ^^ GAGAC CAAGAAGAAT GAGGTATC3CA GACAAGTGGA GGGTGAGCAA 
22451 GACGAAGAAA GGTTTACTGA GCAAGAGAAC AGCTCACAGG AGACCCACAG 
22501 TGGGCAGCTC CTCTTCATAG CCAGGGTGTC CCAACAAGTG TCCAGCTCCT 

22601 ° GTAGAAGCT CCTCTCTGCA GGCAGGTTGT 

till* S^T* 5 ^ C3TTCAGCTTT CAGCACACAG TAGGCAGTAG GCCCTAGAGT 
J " 1 GGTCTATCTC CTCTCTGCAG GCAGGTAGTC CCATGGTCTC CCAGTCACCT 
22701 CTCCATCTGC AAGGCTCCAA TGCTGCCTCC AGGACCTCTC TGCCCACCCC 
III s * 1^^° ACO^GCTGC TCCCCCACCA GTGGGCAACT CAGCCCAGCC 
III 0 * SS£^? TGGT AGCTCCCAGG GTGGCAGGCT CTGGGGGGCT CCCAGGGATG 
22851 GGCTCCAAGG ACTGTCCACC TTCTCCCCAC GCCCTCCCTG CAGTGGCCAT 
22901 GGTCAAGAAT GGGAATGTGG GGCCAGGTTC CGGAGCAGGA GAGGCTCCAG 
HI 5 * AGGTCCTGCC TGGTCACGTG AGGTTGGGGG TGGCACAGTC 

23001 GGCTGCCTCA GGGATGTGGG ACACAGGGGA CCCACCACCA TCACTGCTAC 
23051 TCCCGCATCC GCTCCTGCTA CCACTGCTCC AGACAGCCTG TAGCTOCCAT 
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23101 CACTAGCACT TAAGAAAGGC ACATTCAGTG GACAGCTCAG GAAAATCTTT 
23151 ACGTCAATTT TTTATAGGCA AAAACATTGT TTCCTGGGCA AACAAAATTT 
23201 ATGGACTACC AATAAATAGA AAACTGTAGA GATTCTAGAT TAAGTCTAGA 
23251 AATAATC CTG TAGCCCAAGA TTTATTTATA ATTTGTCAAG AATCTGTATT 
23301 TTGTTTTGAC AAAAAAAAAA CTGTGTGGTG TGGGTCCTTC AGGAGACACA 
23351 GTGTGACAAA GCAAAGCTAA AATCAACTTC TTTGCATTGC AAACACCAAG 
23401 GCTGTAGTCA AGCAGCTCAC TGCCTATGTG TCAGATGACT TTGCTTCATT 
23451 TTTCATCATG ATACTTGTAG TCTATAGAGC CCTGAATATT AACTAGCTTT 
23501 CTC CCAACT C AGAACCGTGT TAGGAGGTGG TTGCTTTCAA AACTAAAGTG 
23551 TTAATGTTTA TTTCCATTTC TATACCAGGA AAGTAAAAAT CTTTGGTCAA 
23601 AATTAGAAAT CTTTAACAAC TAGTTACTTG TGTATTGACA GTTTCTTTCC 
23651 AGGTGTAATC ATTCTCCCTT AAAATCCGGT TATATTCACG ACCATTATAC 
23701 TTATCCTGGT ATCATTCCTG GAAATGGCTA ACTTGCATCC TGCTCAGACT 
23751 AAGTTGACAA AGTTTCAATT GAAGAATTCT AACTTTATGC TATTTTCCAC 
23801 TTTATTGCAT TACAAAGGAC AAAATATATA GTTTTCTTAA AAATGAAATA 
23851 AATTTACTGC CTTAAACTAC ATTTGACGGT AAACTGAGTT CCTTCCATAG 
23901 AATAACCACT AACAGCAATC GATGGTCCTG AGCAATTGAC TCTTCACCAT 
23951 ACAATGATTT GGGATGCCTT TAAGGGTATA TTTGAATTGA ATATTTTCAA 
24001 AAGCTCCCAC TTTGTAGAGT TTATCATCAC TAGTTTCCCC AGTGGAATTT 
24051 GTAGAAAGTT AGTAGAATGA AACAATCTTA TTTTGTATAA TGAGGAATAG 
24101 AATACTGAGA ATGTGTCTGA GAAACATGGC ACTGGTAGGA AAAAGTAAAC 
24151 AQTTTATTCT CATCTGCTCA ATAAGCTAAG TCATTTTAAC TTGAAAATCA 
24201 TCAAAATTTT CATGAAACCT TCCACCAACT TTATTTTTCC CCAGCTTTAG 
24251 TAAGATATAA TTGACAAATA AAAATTGTAT ACTGTATACA ACATGATGCT 
24301 TTGATACATG TATACAAGTT TAAATATTTG TGTTTCCTTA GTCAAACTCC 
24351 TCACTTTTTT GGAAGTTGAC AGAATTTAAT CTTGGATTGT GTCCAATAAC 
24401 TAGCTTTTAC CACTATTCAG TATATTTTGG ATAAGAAACA CATAACAGTT 
24451 TATT CTTTAA A AAAGC AATT TTACTATTTA GGAACTGTGT TTAAAAAGCA 
24501 TTTTA AATAT CATTTATGCA AGAGTTTTCA AGGTTTTTTC ATTCTAAACC 
24551 CTTTAACCAA AAAAAAAAAA AAAAAGATTT ATGTGAAATT CGAAGTAAAT 
24601 AGAAGAGATC AAAGCAGATC TGTTCTGGCT GAGGCTGAGT TTGAGACCTG 
24651 TAAGACAGTC TACTTGCCAT ATGGCTTGGC TGTGTCCCCA CCCAAATCTC 
24701 ATCTCGAATT GTAGCCCCCA TAATTCCCAC ATGTTGTGAG AGGGACCTGG 
24751 TGGGAGATAA ATTAAATCAT GGGTGCAGTT TCCCCCATAC TGTTCTATGG 
24801 TAGTGAATGA GATCTGATGG TTTTATAAGA GGCTTCCCCT TTCACTTGGC 
24851 TCACATTCTC TGACTTGCTT GCCACCATGT AAGACATGCC TTTTGCCTTC 
24901 CTCC ATGATT GTGAGGCCTC CCCAGCCACA TGGAACTCTG AGTCCATTAA 

24951 AccTcrrrrr ctttataaat tacccagtct cagatatgtc tttatcagca 

25001 GTGTGAAAAC AAACTAATAT AACCTGTTTC CTCTGTCCCA TTTATCCATC 
25051 TTCTGAAGTG GAATGCAAAG AAGCTTTACC CCGAACTGCT GGAAAACCAT 
25101 AGTTCTCTAT TAATACAAAC TATTTGTGGG CTTTAGTCAT CCACTATTTG 
25151 TGCCTTACTC ACCCATTGCT TGTGATAGTA TCCACCTAAT TAGAGGCTGC 
25201 CTATAAGTCT CTACAAAAAC TGTACACAGA TGTTGTTATA TCAGATAGCC 
25251 ATTCTCCTAA TTAATCTATA TT3TTCAACTG TCTAGAATCC ATATATGGTC 
25301 AGTATCCTCT GATTATTCCT GGTCATTGAG ACCAACCAGG AAAATATCAA 
25351 ATTATCACTA TTTGTTTTAT CTTCTTTTTC AGCAATGAGC TCATCAACAG 
25401 GGGAGTTAAC TGTCCAAGCA AGTAAGTCAA GTTAGCTTAT ATAAACAAGT 
25451 TCAATTTTCA CATCAGAAAG GACATTTTCA AATATTTGCT CATACTTGCC 
25501 CATCTGTCCT CCAGATTTTC TTTGAGAGAT AATAACTATT TGTACGATAG 
25551 ATTTAAATAC ATTTTTTTTC TAACTCATGG ACTGATCTTT TAGTCATGTT 
25601 CA AQAAAAA A ATTGCCATGG TAACCTTCTG GGGCAATTTG AAGAAAGCAT 
2S651 TTATTTTTGA TTGGGAATAT TGGACTTGTT TTTCTAATTT TTAAAAATGC 
25701 CATAAAATGT ACTTTCTGCT ACAAAATAAA ATAATAAGAA AGTAATCAAT 
25751 AGGAAGGACA TAAAACCCAT TGTCTGTGAC TGACAATTTG TCTGTGAAAT 
25801 ATGCTAAGGT CAGGAGTTCG AGACCAGCCT GACCAACATG GAGAAGAAAA 
25851 CCCATCTCTA TTAAAAATAC AAAAATTAGC CAGGTGCGGT GGCAGGTGCC 
25901 TGTAGTCCCA GCTACTTGGG AGGCTGAGGC AGGAGAATCA CTTGAACCTG 
25951 GGAGGCAGAG GTTGCAGTGA GCCAAGATTG CACCACTGCA CTCCAGCCTC 
26001 AGCGACAGAG TGAGACTCCA TCTCAAAAAA GAAGAAAAAA ATATGCTTAA 
26051 TAGATTCATC TTAATCGCTA ACAGTGGCTT CATTAAATCA CTTCAAATCA 
26101 CTGTGG CCTA AATTTTGAAA GATTTTACAA AAAACAGTGA TGAATTTGAG 
26151 CAATGATGTT CATGCATTTG CCTCTGTGAC TTGCAAACAC CCTAAGTATT 
26201 TTTATCCATG TGTTTATTCA TTCAACAATA TCTTTTAACA TCTACCAAGT 
26251 GCCAGAAATT AGACCAGGAG TTGGTGGTAC CATTGTGAAT AAAACATGAT 
26301 CCCTGCTCTA AAATTAGAAT TCCAAAGTAG AGAAAGATAT AAATAAATCA 
26351 GGAAGTATGA AAATAATGTG ATTAATGCTA TGACAGAGGA AGTGCATAGT 
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= TTGATCAGAG AGTCAGCTAA CCTGTTCTCA CACAGTAAGA 

26451 AAGTGAACCC TGAAATGTGA GAGAGAAGAG GCCATGAATr Xir^.^C^ 
2«« GTCCTOGGCA G^S £££££ 

TGGGGTCATT TCCTGTAATT ACAAGATGTT TCTTATAACT 
r^l^rZ 0 ATCTmTTC AGGTTGTCGT AAACGAGTTG TTCCa"?^ 
2^701 ^ C ** aTG GASTCATTGC ACCCAAGGCG GCCTGGCCTT 

f tl^ ^^ rrC CCTTCAGTAT GATAACATCC ATCAGTCTGG GGcSc™ 
26751 ATTAGTAACA CATGGCTTGT CACTGCAGCA CACTGCTTOT ^Z^^t 
26801 TATTGACCTT AAGTTAGAAC CCACTTCTCC 

26851 CATA' l - iv -i-ir- n«i»mi.™., . J->-i>j<_ iaaaaagccC TGAGTTTTGT 

,„„: rr.^ZT 3 GTAACAATTA ATGTCTCAAA TATTACTGAA GTAAAATAAG 
« 0« ^S^ 1 ™ TTCAGGTTCT TOTCTAAAAT AATCTTACAC TTGCATACTT 
27 6 001 ^ TAACRG T ^ATCCT AGTATCCATC 

27051 CTAtS^ 7^°™ TAATAAGGAA ACTGTGTAAA GAAATCAGAA 
37101 ^T A ZE^ ACATCCTAAC ACAAAATA1T CACTAATAAC ATGTACCATT 
27^ e*** 00 ^ TCTCCACTTA AAACTAGTGT ££££££ 

l^m ^^^T? GGCCAGTCTC ATACTGATCT TAAATAATCA AACTAATTra 
AAAGTAAAAT GGAAATTTTC AATAAATGCC GGAAGTTGGT AAcSstS^ 

2 2 7 7 30 5 l^SS S £££££ 

« £££££ 

274 Ol TTGGGCCACT CAATGTGACC TTCCCATAAT AGAGCATOTC ^r^^Z 
27 O 5 ! 1 Sa^C Sa^T 

S£ ™^ ^Solx AATAACCAGT CATTTTTATC 

276 6 " ^zsr ctaagtagat 

27701 S 

27751 CAGGCAGTTG GAGAGCTGGG CAAATTA^rr- AAATATGTCC 

S s = iH = S 

28101 S CCAGGAGTTT G^S 

AAaS^I ^^CAGC SSgaS 

AA ^^ AA TATAITTTTA AAAATAATCT ACATAAATTC mmr™ 

sss s EF^ 

£££ l^S^ £££££ " • 

Ss ™ ~ = sss s 

28551 AATATCCAAA £££££ 

2SS iJ A ^ ACCTCC AAAGGGCATG 

""J SSSES ATG^SS 
28751 aSa^ 

aggcttaa?a ^a^aI 

28851 ACCCCATGAC ACAAGTTTAr pw»w r^" Mfflr GTACAACAAA 

28901 AACTTTGAAA AAaSI™ AACCTGCACA TGTAACCCTO 

SS sssss HsE i™ ssss 

KS ^aS £££££ 

SiSi S T ™ T 

29201 AAAATACAAA SS^S^ T AAAGCRBaA AGTCTTCTGA 

29251 AGCAGAGAGA GGCAGGCA^ E^^SS* ° AGTOTTrTA TAAAGGGAAG 
29301 TCCAQGCTTT S^S^ 1^^^ GAATCA AA^ GTTTAACCTA 
29351 JSSS aSUSt 

29401 CCCATTQCCC AAGCCCACAA GcS^S A^^ 

29"^ i^S! «W A^SS aS££S 

2 2 9% 5 5°1 1 

29601 CCCATATCAC TTCAATA^AA £2^^° QCTTAATTAT CCAGTAGATT 
29651 CTCTAAAATC TAS^S^ ™ AGCACT TTATCATGAC CTATAAAACA 
CTCTAAAATC TAGTCCCTGC TTACCTCTCC AAGCTCACCC CCAACCATTC 
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29701 TTTCCCTTGT GTTCTGACTG CAGCCCATCC AACCCAAGAC CTTGGGATTT 
29751 TTGCCTGGAA ACTTGTTTCC CTCATCTCCT CACACTGACC CTCTTTTACT 
29801 ATGTCTTAGC CCAAATGCGT TATCAAAATA ATCATAATGA CCTGTTAGTA 
29851 CTCTATTCCG TTACCCTATT TTATTTTGTT CATAGCCTTT ATCAATGTTT 
29901 AAGATTATTT ATCTATTTGT TTGCTTGCTT TGATCCTTTT CCTTCTCTGG 
29951 AATCTTATAC TCCTGTGAGC AGGCACCTTA GGTCCTGTTC ATCACTTTAT 
30001 CCCCAGCAGT TCAGATAAGG CTCAGCACAC AGATGCTCAG TAAATATTTG 
30051 TGGAAGGGAT AAATGAATGA TATTTTATGT GTATTACAGT TCTAAAATTC 
30101 AATAGTTTTG TATTAAATAT CAGTTCTAAT ATGGCATTTA TATGATTTTA 
30151 TCTTTCAAAA CATTAGCAAT AGATTATATT TAAATGATAA AAGAAAACTA 
30201 TAACTGCAGC CAAGTATTCT CAGGATTGTA TTTCTCTTAT ATTAGCCTAA 
30251 ATGCAATTAA TCTAGCTCAT ATACTTTGGG CAGCTTATAT ATATTCTGTT 
30301 AATTTCTAAC CTTTTCCAGG TATAAAAATC CACATCAATG GACTGTTAGT 
30351 TTTGGAACAA AAATCAACCC TCCCTTAATG AAAAGAAATG TCAGAAGATT 
30401 TATTATCCAT GAGAAGTACC GCTCTGCAGC AAGAGAGTAC GACATTGCTG 
30451 TTGTGCAGGT CTCTTCCAGA GTCACCTTTT CGGATGACAT ACGCCAGATT 
30501 TGTTTGCCAG AAGCCTCTGC ATCCTTCCAA CCAAATTTGA CTGTCCACAT 
30551 CACAGGATTT GGAGCACTTT ACTATGGTGG TGGGTATCTC AGGATAGCTA 
30601 ACAGAGCGCT AAGCCCTGTC TAAGGCAATG TGATTTCATC TCCATCAATA 
30651 TTATCCTGAC AGCCATTTCC ACACAGTCTG GTTGGATTAG TTAGGGTTCT 
30701 TACTTTGTGT GACAGAAATT CAATTCACAT TAACCAGTGC AGAATAAAAA 
30751 ACAAAOAAAC AAAAACTTCC ACAAATTTGG CTCATGTAAT TTGGAAGTCA 
30801 AAAAAGTGTA GTAAGTTTCA CTTCAGACAC AGGGGTTTAT ATGATGTCAT 
30851 CTGGCTCTGT GTCTCTGAAT TTGAATTTTT TGCCCCTTCT TTTCTCTATG 
30901 TTGGCTTCAT TCAGAGGGAT GCTAGCTTCA CCTAGTGTCA GAGGTGGCTA 
30951 AC AACA CCTC AACACATCAT CCTCAACAAA GAAAAAATAC ATAGAAAGGA 
31001 ATATTTATTT CTTTTCTTTG CCAGAATTCA CATTAATTTC TATTGTTCCA 
31051 GCTGTGTCTA GGAGGACTCA GATTGAGTGG CTAACTCAAA TATTCTTTAT 
31101 GCCTATGTAG CAAAATTTGC TTCAGTACTG AAGAAGCTAA TTTAAGTGTG 
31151 ATG6T6AATA AGAATAGTGT AGAGATAAAT TGTCAAACTA TTTGTCCCCT 
31201 CT AAAAG TAT TCAACTTGAT ATACTAACTT AGTCTTGTAA GAAATAATGA 
31251 TGATTTAGTT ACTGAATGTT CTAGGCAATC TTAGTGAGAC ACGCTCTGGA 
31301 TTCTAACATG TGGTCCAGGT ACATATGTAT AACAAAGCTA GAAAGTTTCT 
31351 TTAACACTGG GCTTGAGAAA ATGCAAAAGG GCTTTCTGAG AATGACTAAA 
31401 TCTATTTGCA GGATTCTATA CAATTTATTT ACATACAAGA AATTATAAAG 
31451 AATAAGCTTT TGATTCTCAG TCTACCATTA AGGAACTAGG AATAACCTTT 
31501 CACTCACATA GGCAGGAATC GGTTTTAGGG TCTCTAGATT TTTTCCAGAT 
31551 G TCCC ATGTG GTTTTGTTTT ATCTTATACA GAGTGAGACA TGCATTGCTT 
31601 TCTTTAAGGT TGTATTACCA ATCACAGAAA ATATTACCTA TGGTTTATTA 
31651 ATTCTAGTAG ATCCAGTGCT GCTGTAAGCC TGACACCTCC CTAGGTCTGC 
31701 ACT CTC TTGG ATGGATTTTC TCTGAAGATA GGGCTTGCAT TCTCTGCTTC 
31751 AX AGTGG TGG GAAAGACATC ACAAATCCCC TTTGGCTTGG TGGGAAAAAT 
31801 CACTTTCAGG AGTTTGAGAC TGGCACAGAA ACATACCTGT CATAATGCGC 
31851 TGTGAGTGGC AACAGAATCT GACACTTATA GAGCACTCCA CCC1ACTTGA 
31901 ACACGGCCTC TCTTGGTGAG TGACCCACAG GTGCTTTTAA TCTATTAAAT 
31951 AGATTAAATT AACCTATCAT TCTTAATCTG TTAAGTACAT TAATAGATTA 
32001 AAAGCAGCCA TTCGTTACTC ACCAAGAGAG GCTATATTCA AGTCTGTAAA 
32051 GCAAACCTTA AGAAGTTTTT TAAAATTGAA ATTGTACAAA GTATATTCTC 
32101 TGATCATAAT GGAATCTAAC TAGACATCAG TAACAGAAAG ATAACATAAA 
32151 AATCCCCAAA TGCTTACCAA TTAAAAAACA TATGTAAATA AAGAGAATAT 
32201 CTCGAAGAAA TTTGTAAAAA CAAATAGAAC TAAATGAAAA CAAAAATATA 
32251 TAAATATATG CCAGATGCTG CTAAAATAGT GTAGAAAGGG AAATTTATAG 
32301 AAAATGCATA TTATAAGGAA AGATATCAAA TCAATAATTA AGTTCTCACT 
32351 TCAAGAAACT AGAAAAATAA AAAATAAACC TAAAACAAAC ATAAGGAAGG 
32401 AAA TAAT AAG AATAAGAATA GAAATGAATA AAATTAAAAA TAAACTATAG 
32451 AAAATTGATA AATAAAAAGC TGATTATTTG ATAAAATCAA TATTTTGCTA 
32501 GAAATG TCA T TAAGCATTTT TACAGAAGAT GAGATATAGC TCAGGGATGT 
32551 CCAGAATTTA TGGGCTATGC TTTTCATGAC TTGGAATACA TTTTACCAAC 
32601 CAGTTTAGTT TGCTGAAGAA GTTGTGGATT TGGACTGTCA CCTACTTACA 
32651 ATACTTAGAT TGTCAGTTTC ACCTTACTCT TCTCACCATT ATTTTATTTT 
32701 TATTTTTATT TTTATTTTTA TTTTGAAACA GAGTCTCGCT CTGTCTCCXA 
32751 GGCTGGAGTG CAGTGGCGTG ATCTCGGCTC ACTGCAAACT CCGCCTCCCX3 
32801 GGTTCACGCC ATTCTCCTGC CTCAGCCTCC CGAGTAGCTG GGACTGCAGG 
32851 CGCCCACCAC CATGCCCGGC TAA TTGTTTT GTAGTTTTAG TAAAGAAGG6 
32901 GTTTCACCGT GTTAGCCAGG ATGGTTTTGA TCTCCTGACC TCGTGATCCA 
32951 CCTGCCTCGG CCTCCCAAAG TGCTGGGATT ACAGGCGTOA GCCACCGCGC 
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33001 GCCAGGCCAT GAATGTTTTT AATTGATGAT ATAGTAGGCA ATATAAATGT 
33051 GTGTGTGTGT GTGTGTGTGT GTGTATAATA TATATAAACC AATTGTATTC 
33101 AAATAACAGA ATAATTTGAA AAATCTCTTA GCATATTTCT GAGTTACACA 
33151 CTTAAATCTT COGAGCACTT TTAAATATGT GTTTACAAAC ATTTCTT CAG 
33201 AAATAAATCT TGGAAATCGT CTTCTAAAGA AACTGGTGTA TTAGGGTTTT 
33251 TTCAAATGTA CTTAGTTTTT TTTTTAATTG ATGTATAAAA TTGCATGTAC 
33301 TTACCATGTG CAACATAATG TGTTGAAGTA TAGTATATGT ACACTGTGAG 
33351 TGTTAAATCT AGTTAACTAA GAAGCGTCTT ATTTTACATA ATTATCATTT 
33401 TTGTGGCAAG AACACTTAAT ATCTACTCTT GTA6CGTTTC TCAAGAATAC 
33451 GATATATCAA CAGTAGGCAA CCAGAAGCTG GGGGTCTTTA CAGGGGAAGG 
33501 AGTTAGGGAG ATGCTGGTCA ACAAATTCAT ATTTGCAGTT AGGAAGAAAA 
33551 AGTTCAAGAG ATCTCTCATC CATCATGGTG ACTATAGCTG ATGATATATC 
33601 GTATTCTTGT ATTAGTTTTT TATAAATGTG TAACAAATAA TCACAAACAG 
33651 TTAAAACAGC ACTCATTTAT TTTTATCTCA CTGTTTTCAT GAGTCAGACG 
33701 TTCAGACACA GCTTAGTTGA GTCCTCTTCT CAGGGTCTCA CCAAACTGTA 
33751 ATCAAGGTGT CAGCTGGGGT TGTGGCCACA TCTGTGGCTC CTTTGAAGGT 
33 801 CTCCTCAAGG TTTGCTGGCA GAATTCCTTT ACTCGCAGCT GTAGAATGCA 
33851 TGCCAGCTTG CTGCTTTAAC TCTTTAGGAA AGTGTCTCAA CTCCAGCAAG 
33901 GCTCGCCCTT TTTGAAATGG CTCAGCTGAT TAGGTCAGGC CCACCTTTGA 
33951 TAATCTCCTT TTGATGAATT CAAAGTCAAA CTCATTAGAG GTCTTAATCG 
34001 CATCTGTAAA ATTCCCTCAT CTTGGCCATA TAACATAACC TAATCATGAG 
34051 AATGGCATCC CTCATATTCA CAGATCCTGC CCATATTTGG GAGGAGGGGA 
34101 ATCACACAGG AATCTTGGGG ACTATCCTAG AATTCTGCCA ACCATGGGGT 
34151 CATGGTTTCC CAATCAATAT ATGGTTTGGT ATAAAGAATC CCTGAATGCT 
34201 TGTGCTATTC TTASTTTTCT ACGTAGCCTG CCATAATAAT GGTTTCTAAA 
34251 ACTCAGAACC TAGCTTACAG TCTGCAGCCA CCAACTTGTA ATACATTGGA 
34301 AGTGAAATCA TTGCOGTTTA ATGCATTTAT ATATATATGA TGTATAATAT 
34351 ATGTATATTT CACATAIATC TTATATATGT GAAAGCTCAT CATAAACTTT 
34401 AAATAATAAA ATAAATGTAC ATAGTATTAT AGGCATTTTA TCAAGCCAAT 
34451 GGAGAAAACC ATCTAGGCAT GCAGA GTTT C TGGGAACAAT CTGGAACCCA 
34501 CAAATAAAAG CTTTACAAAA GATAAAAGGC CTTCCTGAAA TATATAAGCT 
34551 GATTATTTTT AAGGTTAGAT TTTACCAGGA AAAAGAATCC AAATGGCTTT 
34601 CTTGCTTTGA GAAGTTTTTA TAAAAATGTG ATTGGACAAT AATTATCGTT 
34651 AGATGTGCCA GATTTAACCA GAAATTCTTT TTTCTAGAAA CTGCTTATAT 
34701 TAACTTCATT CTGTATTGAC AATTTTACCA TGAAAAAAAT ATTAGGAAAG 
34751 TCTTCTCACT TCACTCTAGC CAAAGATGCT GATTGTAAAT ACTAGAATAA 
34801 CTCTATTTTT CCTTAAGGGG AATCCCAAAA TGATCTCOGA GAAGCCAGAG 
34851 TGAAAATCAT AAGTGACGAT GTCTGCAAGC AACCACAGGT GTATGGCAAT 
34901 GATATAAAAC CTGGAATGTT CTGTGCCGGA TATATGGAAG GAATTTATGA 
34951 TGCCTGCAGG GTAAGTTGGA GGQATTTTTT TATATTACTA ACTCAAAAAT 
35001 TTGTATCTGG CTTAGAATAT ATTATATGTT CTTTACATAA GGACAAAACA 
35051 TAGATATCAT CTCAGCTCAA AAAAGTTACA AATGCAAATT TCACAGCACA 
35101 AAATACTTTT AAATGTTTTA TTAAGATAAA TGAAGTAAGA GTTTCTCTGA 
35151 TGCTATCAAA CAAACAAAAT TAGAATTTCT TAACCAGAAA TCCAAAGATT 
35201 AATAAAGCAG TTTATTTTCT CAAGOGGCTC ACATTCAAGA AAGAAAATAA 
35251 TCATAAACAG AGAAGTATAA AGTGATGTXA TGAATAATAT AATGAAAAGC 
35301 AAATATTTTT CTTGAAGGAA ACATTTTTGG AACAAGTATC AGAGAGATGA 
35351 GACGTAAATA AGGCCTGAAG AATAAATAAC ATCCAATTTC AGAATAAGAA 
35401 AATAATGTTA TAGAAAAGAC AAAAAGCATA GCCAAAATTA TGAAGGTGTG 
35451 AAATTACAAT TCATATCTGA GGGAACTCCA AGTAATTGGT TGGGTCTCAG 
35501 CATGAGGAGG ATGAGAAGAG AAACAAGTAG ATAACCATGA GAAGGTGGAT 
35551 TAGGCCATGT TGTGATTCCA TGGGCCCTCC CCAGTGCCCT CATCTGCCTT 
35601 CTAACATGGA TGTTTTCCAG CGAAGGTACG TTTCTTCCTG GAGACACTTG 
35651 CTTTTTAACA TGAGATACTT TAGAACTCTA AGGAGGCCAC TCTATGTGGA 
35701 AATGATGGAA TGGTATTGAT ATCAGGTGGC AGAAAGTCCT GTCCAGAGTC 
35751 CCACAAACTG TACCACATGT GCGACCTCTA TCAGAAAAGG AGCAGGGACC 
35801 TATGTGACAT AGAGGCTGGG CAAAAGCAGG ATCTGGTCCA CAGCCAGCCT 
35851 OGGTTGCTAA TAATGTGGAG GGAGGCAGGC AGAATTTAGG GATTCCAACA 
35901 AAAGGTCCAT ACCACGGGGA ACAGGTGGAA GGTGCAGGAG TCTTGGAGCA 
35951 GACAGGACCG GGGAATTCAG GTGAACCATG ACATTACTGA AAAGCCTTAG 
36001 GAGGGATTGG TGGTGATAGA GATGCTTCAC TGGATTGGGG AGCAGAGGTA 
36051 AACTTGCTGC CTAACTGTGC AAAGTAAGTG ATAAAACAAG GCTTTAGTCA 
36101 TAGAAAAATA CAGTAAGTTA TCAGGGCAGC GGTTCAGGTA CAAGGATCCA 
36151 AGACAGGAAT ACAGTGATTG TAATTGGGGC ACATGGTGAG GGGCCTAGTC 
36201 TGATACAACA GAAGTGCAAG CACCACCAAC ACCTCGTCTT TCTCCATAAG 
36251 TCTTTCTCTC CAGAGCCCTC ATGACCTAAT CACCTCTTCT TAAGTCCCAT 
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3 63 01 CTCTCAACAC TATTGTATTG GAGATTAAGT TTCCCCAACC TATGAACTCT 
36351 TGGGCTCACA TTCAAACCAT AGCACCACCC AGCACAAAAG CACAGAGCTT 
36401 CCAATCTGGT TTCTAGCTCC ATACCCTAGA ACCAAACAGT AAGAATCACC 
36451 TCTGGAAATG TAGCAATAAT ATAATCATAA TTTTTAAAAT CCAGTGGAAG 
36501 GATTGQAAGA TAAAATCAAG GAAATCTCTC AGAAAGAACA ACAACAACAA 
36551 AAAAGACACA GAGGAGAAAA ATAATCAGAA AAATTAAGAA AACTAGAGGA 
36601 TAAGCTCAGG AGATCCAACA CCAAATGAAT AGGAGCTCTG AAAACATAAA 
36651 ACGCGAGTGT ACAATATAAA AAAAAATAAA GAATGCTCCT AGTTCTGAAG 
36701 CTTACATGCA TCCTATTGAA GAAAAGGTCC AAGTAGTGCT GGGCACAATA 
36751 AATGAAGTAC TTCTTTCCAA GACATAC CAT CATAAAGGGT CAGAAGCCAG 
36801 GGATAAGGAG AACAATCTTA AAACTTTGAA GGAAGAACCA TCAGAACTAC 
36851 ATAGAACTCC TCAACAGTAA CTCTAGAAGG TAGACGATGG TGGAAAACAC 

HIV! al^^I" CAAAGGGAAG ATTATTTCAA CCTAGATTCC TACCCATGCT 
36951 AACTAAATAT CAACTGTGAG GGTGGAATTA AGAAGTTTAG ACAAGCAATG 
l 7 ™* ^^^^ ATGTACTTCT GATACCCTAC TTCTTAGGAA ACTACTTGAG 
AGGGTACCTC AGCAAAATGA GGGAATAAAT CAAGAAAGTG GAAGACGTAA 
37101 GACCTGAAAC TGTTAGTCCA ACACTAAAGA GTGGTATCAG ATAATCCCAA 
11^ ? A f CA!I!AGCT CrroCACCAGG CTTAAAGTAA CCAGCTCGAA TTTGAGCAGA 
37201 AGTAAGAAAA GATTGTGTGT ATGTGTATGT GTATGTGTGT ATGTGTGTGT 
37251 GTGTGTGTGT GTGTGTTGAT ATGGTGGAAC AGCTTCAGAG GAAGTAAAAG 
37301 AACTAACAAG CTATCTGATG TCCTTGAACA TTAGTAAACA TTATTGTGAG 
37351 GTGTTGGTAG ATCTTTTGGA GCATTCAGCA TTTACCAGGT ACATAGAAAA 
37401 CTATCCACAT GAAAAAAAGA GTTGTGTTAT TAATTCTAGG AAAGCAAAAA 
™J AAGATTTCTG TAATCCAAAT ATGTTACTTG ACTCTTCAAT TAATAAAATT 
37501 TACACACTGG TACTAAATGT AGGCTGTTAA TTTAACCAAA AATAGAGATG 
37551 CTATAATGTA AAGATGTGGT GTGGAAAAGT TGCAAAGAAG TTGTAAAACA 
37601 ACTAAATCCC TAACTACGTA AGAGAAAATA AATATTTACT GTCTAAACCT 
37651 AGAAGCTGTA ATTTGAGCAT ATTATCTAGT GATAAGGAGT TAGATACTAT 
37701 AAGAAATCAT TAAACAAGCA TGAAGTGGCT ACCTCTTGGA GAACAGCTTG 
37751 CGTGAGG TAA CATGGGACAT AACTGCTTTT CAAGCCTCTT CATGTTTTTT 
37801 CGTTTTTGCC TTTTTTAACT AAGTGCTGTT TACTCTAACA AAATAAATTT 
37851 XATTPrTTAA ATGTGAAAGT TGAACCTTAA GGCTCTTTGT AATATTAAAA 
37901 TCCATGTCTC AATTAATTAT TCTGTGTTGA TAGTCTATAC ATCTACTGTC 
37951 TAGTAACAAA ATATGTGATT CATCAAAATA TCTTAAATAA TGAGCTTTAT 
I*™* 2™™* T1TTCTTTCT TTTTTCTTAT GTTTTTATTT TXAGGGTGAT 
38051 TCTGGGGGAC CTTTAGTCAC AAGGGATCTG AAAGATACGT GGTATCTCAT 
lli^ I?^ ATTGTA AGCTGGGGAG ATAACTGTGG TCAAAAGGAC AAGCCTGGAG 
«ont AGTGACTTAT TACCGAAACT GGATTGCTTC AAAAACAGGC 

ATCTAATTCA CGATAAAAGT TAAACAAAGA AAGCTGTATG CAGGTCATAT 
38251 ATGCATGAGA ATTCAACTAT TTAGTGGGTG TAGTACAACA AAGTGATATT 

38351 ^SS^ ^^ AAC ATGAAACACA CAACGTAAGT TATTTAGAAT 
38351 CACTTTAATC AACCAATAAT CCTTAGCCAA TTTAXAAGGG ACTTTTATTT 
38401 GTAAAGTAAT GGATCTGGCT TGAAAAATAC GGTAGAGATA CTTAGCTCTT 
38451 TAAATCACGA ATGTTGAAGT ACCAGTGAGA CTCAATACAT ATTTTTGAAG 
H 5 °l i^S^ GGATTTTTAG AATGTCGTTG TCAAGGGTCT CCTTTTAACT 
38551 GAGAAACTTT TTGAACTCAC AAAGTGTTCA AGAAACCCTT GTATAATTCC 
lllll CTOGAGCTCA CAAATACTTT TTTTtSS T^TA^S 

38651 ATCAGATTTT CCAAAGTACC TTTCCACCAT AAGAAATGAA TTTTCTACTT 
l Q ™l CCAATAAAAG AAAGTCATAT GTAGGAAACA 

3 8751 AAGTCTGATA GTAAAACAAG CCAGAGATCT TCTAACTTTT TTTAGTTATA 
38801 AAACCTCTAA TTTITGGTGA CTTTTCTACA CACACACAcI 2S^SS ID KO:3) 
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Exon : 
In tron : 
Exon: 
Intron : 
Exon; 
Stop: 



30320- 

30580- 

34818 

34961- 

38045- 

38204 



•30579 
34817 
34960 
38044 
38203 



CHROMOSOME MAP POSITION: 
Chromosome 4 

ALLELIC VARIANTS (SNPs) : 
DNA 

Position Major Minor 



72 
1894 
1897 
2123 
2124 
2648 
2605 
4036 
5056 
5445 
5608 
6243 
6273 
6294 
6312 
6506 
6714 
6615 
6994 
12478 
13493 
13522 
13916 
13974 
15081 
15907 
17884 
17908 
20551 
21222 
21232 
21353 
21904 
22132 
22369 
22742 
22882 
23316 
23867 
23954 
26548 
26573 
27400 
27788 
28069 
29269 
29537 
29726 
30496 
30695 
30752 
30849 
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DNA 

Position 

72 TTATMTCATAAAAGTAGGCAGTAAGTTGAAGATTT^ 
AGCTTTAACCT 
[A, G] 

TGGCTTCTGTAGCTTTTGTAATC^^ 

TCAAAAGGAGAAAaurrCTAACMCOTATCACC^ 
/^SATGCTCACAGCTTCTTCCGTGGGATTTGA^ 
ATGTCAATGGGTATTGAACGACTCTTCAGCTC^ 
TGACTATGTGTCrrroGTGGTGGGAGA^X^ 

1894 ACCTGGGCCCTTAAACT^TATCCTCICT 

TTGGGTCTTAGCGGTAAAAAGATGAAC^G^ 

AAGTGAACMACATGCAAACTAATACTTGAl^ 

AGGTGCXIAGAAGAAC7^GCAAAGAGTTATTTTT^^ 
[C,TJ 

CCOSGTGTGATGCAATATA^ 
TTTAACCAAAATCTAATCAAGACTTCAGAGCTAA 

TAGGAAATGAGGGATATAAAAiSAACAAGTTAAATAAT^ 

GTCC3U3AAAOTAAGA1MTCTAAAGGATG^ 

TTAAAAACTAAAAAAGAAXK^GGACTCri^^ 

TCGGCCCTTAAACAGATATCCTCTCTCTC 

TCATTCCTOCCTGACTCrrCATAGATTO 

GGTCTTAGCGGTAAAAAGATGAACAAGGCXA^ 

TGAACATACATGCAAACTAATACTTGATTCAA!^^ 

TGCCAGAAGAACAGCAAAGAGTTATTTTTTC^ 
i C f Tl 

GGTGTGAIXK^TATAAAATACACAGCACCAC 

GAAATGAGGGATATAAAAGAACAAGTTAAA^ 
CAGAAAGTAAGATATTCTAAAGGATGTTTA^ 
AAAACTAAAAAAXSAAGCA^GACTC^^ 

TTGCTGAraTAGGTIX^^ 
AAAATTTTTATCCCCGGTGTGATGCAA^ 
TTGCCAAAJGAATTTAACCAAAATCTAATCA 
^TC^TTTATAGGAAATGAGGGATATAAAAG^ 

^^TAMAAACAACHXSCACTGCaTGGTCCTC^ 

CCTAATTGTCTATAATAATQATTTGGTAAAAAGTCACQATQTTTTATTTCA 
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2124 TGCTG^ 

AAATTTTTATCCCCGGTGTGATGCAATATAAAATAC^ 

TG CCAAATGAATTTAACCAAAATCTAA.TCAAGACTT CAGAGCTAAAGAAAATCTAAAGGT 
AATCCAATTTATAGGAAATGAGGGATATAAAAGAACAAC^ 

CATTCAGACAAGTCCAGAAAOTAAGATATT CTAAAGGATGTTTAGCTTGATCTCTTCAAC 
IG,A) 

GTCAATGTCATTAAAAACTAAAAAAGAAGCAGGACT 

AGGCATAACAAACAAGTGCACTGCATG<TrCCTCX3AT^ 

GTGTAATTATAATCAAACCATG<3AGGGAACTT^ 

GCAGAAATATCATTAATTTTTTAGGLAGTGTTAAG^ 

CTAATTGTCTATAATAATGATTTGGTAAAAAG^ 

2 6 4 8 GTTATGTTGGATATATCCTAATTGTCTA^ 

TATTTCACATTAAAATATAGCAGCAGAAAAAATAAATGAGCC^^ 
AACAATTGATATAATAATGTGATATATATATGGAT^^ 
TTTTTATGTCTGAACATTTTCATAATACTTAAAAATAAAAGATAAAA 
GAGATAATAGATTTAAAATCACTTTGTAAACT 
CA,C] 

CAAAGTGCTGGAGAAAGGAGGAATGGTCCCTTTTCAAGC 

GCTGCTAAGAGAAACCATTCCTGACCACC^C^ 

AAAGCAGGAGCAACATTAGGATTCCCAGATCCTGA^ 

AGACCAAGATGACMTCAACA^ 

AACrTTGAACTTGTCTAAGGAGAGCTGGAA^ 

2805 TCAAII^ATACTATTCTTAGTAATTTTTTATGTCTG 

AAAGATAAAAGATAAAAATAAATGAGATAAT!AGAJTTAAAATCAL"1'1"^ 
AGGATAGACAGATAAAAGAGATAACAAAGTGCTGGAGAAAGGAGG^ 

AGCATGT ATGCCACCTTGGACCATGCrrG<rTAAGAGAAAC CATTCCTGACCACCACAAAGA 
GGCCACCAAATGCCTCTAAAATAGAAAGCAGGAGCAACATTAGGA^ 
TA,T] 

TTTTTTTTTTAACACATCrrCT^ 

TGCAGGGAAAGGTAGGCTACAGCAACTTGAACTTGT^^ 

AGCATTGCTATOTSAGAGTAACCAGTG^ 

CACCCGAAGCAGAAATGCTGAAGCCATGGATGATTGCXXn^ 

4036 TTCTGGGGAGAATGCAA^ 

ATGGATGTATGTGATAAAACAAATAACTCAGGCTG<^^ 

TCACCTTCACAGAGTCAATGGGGGAGCAAAGAOT^ 

ATCATAGTCCTAGGCCTTATATQA^ 

TCTCAGGTCACTTTATTTGGTTGCATAAAAGTCT^^ 
[A, 6] 

CAGTATGGATTATATGGGTAAGTAATCA<3GAITGTCCAA 

ATTTCXXACTTAAGATATA!ItKXrrTCC^ 

TCGA^CCCACT^ 

TTTCCCATTCTCTATCTTTAACTCTCT 
GATTACCAAATTCCTTAGGAGTCTCAACTGCITTCCTT^ 

5056 GATATAAAACATTAACTGTTATTTTTTAAATAAAACT^ 
ATATTCAAGATTTATATTGGOZCCATTGTAATT^ 
TTAGTTTCCTATTTTTCATTC 

TOAAAATTTrAGATCCACAAATCAATAACAATTTCGG^ 
AGGACTTACGAGAGACGACOSAAAATTTGGTG^ 
[- * A3 

TAATGQU^CTGGAAGGGATTTTGTGGA^ 
ACOGCCAACATTAGAATCAtTCTTGCAGATTGCT^^ 
GAGTATGATGAGATGGGTAGKSTGGGGAGAGGAGAGTAAGG^ 
TGGGTGATTCTAATAAGCCTCTtXn'TrCTA 

TGAGGCCAAGATATCX^AGCCCXTITTCTTC CXTCAATTCt^CCACGTTTCCC CTGTAGAAA 



5445 



TTGCTAGGCCCCATCCCAGACCTGCTTAATCAG^ 

AGGAGAGTAAGGGAATCTGCATGTCTAACAAATGGGTGATTCT 

AACTCAGCTACCTTATTTAAAGGTAAGAGAATTGAGGCC^ 

TCCCCAATTCCACXACGTTTCCCCTGTAGAAAAGCCTAA^ 

TAAGTCCACACACTTGTTTGTAAOACCACATT^ 
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rc,T] 

GTTCATCTTGTAAGTATATTGATAAAGACAAA^ 
TCAAATGCTAATAATTTTOT 

CACACTCAGTTGTATJ^TCATTCCACTCA^ 

GTTCAAATTAATTTTTCTAACAAGGAAGCACAGAAGCAGA^ 

AAATGACAAATGTATTGG TITC^ I TTA ATC^^ 

5608 TATCCTAGCCCCT^^ 

ACCAAAACTAGTTTTTATAAGT CCACACAL-'ri 1j 1'1'IXSTAAGAC C^C^CrTTT AAGATTTTG 
A GTATT TTCAGAATTTACGTTCATCTTGTAAGT 
TATTTTGTAGTAATCAAGTCAAATGCTAATAATTTTGTTAA^ 
T CCCAAAAAGAAAAAAAGCACACTCAGTTGTATAATCATTC 

TCTCACTCAAAAACTAGGTTCAAATTAATTTTTCTAACAA^ 

TTATTTTAAAAAGAAAGAAATGACAAA^ 

TAAGAC^CTTTCTTTCCCAAAT^ 

CATAGTATACCTAATGGCATCATATTTACAATAAT 

TCAGTTAACATTAAATC^TTCACAATTTCTTAATTTTC 



6243 



6273 



6294 



6312 



TTCTTTTCACATGCAGAGCATCT^ 

AGGATGAACGGGGAGCCTGCACCAATACACCCAAATACC^ 

AGTGACTCCACATAACCTCCTCX^TGCAAAAAGAGAA 

AAAGATAAACACACCTTTOAATGATGGAAAATX3OT 

T l\a ruTi' CATTTATATTTTATGGCCAACATTACTG CTACTGTTGI'lXj'l^IX^rAAJGTTAACTA 
[G,A] 

GCAATTCTGTCTTTACTGAAGTAAAC^ 

AGAAATGCAGAGGTGCATGTTGAACAGAAACTCTATTTA 

CCTAAGCATGTGTTCOTCAAAGGCTAAGGCTAAGTTA^ 

CKXTrACCTGCAAGGCCCTTCTCl 

GCATAAGCCCITACXXTCrcCCCriTGCAG 

AGCATTTGC^TCAGTTCTTAAGTTATGCT 
CCAAATACCTTCTCTACTCCTCCACTrcCTAAGTGACTC 
AAGAGAAAACTCITAACTTGCCITAGTTAAA 
ATGTTACAATTTACTGGGAAATTTTGAAAIL^L"^^ 

[C,G] 

AAGAATGCAATAG 

CTCIMTOAAAAGTGGAGTTTTAAGTTTCACCT 
CTAAGTTMGTAAGGACACATTA^ 

ATTATTTATTTATCCTCCTTTATCACCATAGCA.TAAG^ 
AAATCATTCTATGTTTCATGTGGTATTCTTTTGT^ 

AGTTATGCTAGGATGAACGGGGAGCCTGCACCA^ 

^^^T^CTAAGTGACTCXIACATAACCTCCT CXiATGCAAAAAGAGAAAACTCTTAACTTGC 
CTTAg TT AAAAAGATA AACACA^ 

TT^raGAAATTTGTTTCATTTATATTTTATGGCC^^ 
AGTTAACTAGGCAATTCTQTCTTTACTG^ 
CA, G] 

AGAAGTGAGAGAAATGCAGAGGTGCATGTTGAACAGA 
TAAGTTTCACCTAAGCATGTGTTCCTTC^^ 

TATCATC^TGGGTACCTGCAAGGCCCTTCTCTGGTTGT CATTATTTATTTATCCTCCTTT 

ATmc mTAGmTAA GCCCrCTACCCTCCC^ 

GGTATTCTTTTGTTTGTATTCATTCTTACAAAAA 

GGGGAGCCIXXZACCAATACACCCAAATAC^ 

ACATAACCTCCTCGATGCAAAAAGAGAAAACTCTTAA^ 
CACACCTTTCTaAATGATC 

TrATATTTTATGGCCAACATTACTGCTACTGTTC 

GTCTTTACTGAAGTAAACGGAOUUSAATGCAATAG 
[A, G] 

QAGGTGCATGTTGAACAGAAACTCTATTTAA 
GTGTTCCTTCAAAGGCTAAGGCTAAGTTAA 

CAAGGCCXrTTCTCTGGTTGTCATrATTTATTTATCCTC 
CTTACCCTCCCCCCTTGCAGGtAAATCATTCTATC 

TTCATTCTTACAAAAATATGTTTTGCTATTTTGCGTACA 
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6506 CAACArTACTGCTACTGTTGTTGTTGTAAGTTAACT 

AAACGGACAAGAATCK^TAGGTCTTAAAAGMGTGAG 

AACAGAAACTCTATTTAAAAGTGGAGTTTTAAGTTT CACCT AAGCATGTGTTCCTTCAAA 
GG CTAAGGCTAAGTTAAGT AAGGACACATTATCATCATGGGTAC CTGCAAGGCCCTTCTC: 
TGGTltJrCATTATTTATTTATCCTCCTTTA 
# w J 

TTGCAGGAAATCATTCTATGITTCATGTGGTATTC 1 1 l"l"r G TATTCATTCTTACAAA 
AATATGTTTTGCTAmTGCGTACACT^ 

TTTTGTTTCATCTCTTTTTACTGAGAACTTTTTAAAAG^ 
TTAGTITATTGCTGTTAGCTCCTAATTCATAGTGTGTA^ 
CATGCCAAGAAATGCCACACTAAAOU3ACTCCT 

6714 TTATCATCATGGGTACCTT30UU3GCCOT 

TATCACCATAGCATAAGCCCTTACCCTCCCCCCTTGCAGGAAATCA 
TGCTATOCTTTTGTTTGTATTO^^ 

TGCTTOAACTTACAri^^ 
eix-iyrAAAAGATATATGTTACrEAAA 

CTCCTACTTACCCCCTTATAGACCTATBCAAGTACTTOTGGMGCA^ 

WGAATOTACaTaTACTTAACTTGACCAATTK3TG^ 

CTCAGTGTG<^CGCCCATCTACAATGCATGAGGATTTCTATGT 

ACTTAGTGTCTTAGTATGTTTAGGCTACTACAACAAAAAATACCATAGGCTGGGTATCTT 
6815 ^^TTCTATGTTTCATGTGGTATTL-X 1 1 'lt j 'l *l"l 'GTATTCATTCTTACAAAAATATGTT 

^TGCCACACT7UUU3VGaurrCCTACTO 

AAGCAGAATTACTAGGTCATTGAATGTACATATACTTAACTTGACCAAT^^ 
TGCTCCTCAAAATGG^TGACTTCAGTGTGCAC3GCCCAT(^ACAATGCATGAGGATTOCTAT 

AtraCCAAGATGAAGA3X3ATCAAGGCTCTAGCAGATGTCTGGTGAGAGCCTG^TTTCTC^T 
*»AATOCC»CACTAAA 

GGAAGGAGAATTACTAGGTCATTGAATGTACATATACTTAACTTGACCAATTGGTGCA^G 
TTTGCTCrrCAAAATGGerGACTCAGTGTGC^^ 

ArarccCACATCTAACCAACACTTAGTGTC^^ 

TACCATAGGCTGGGTATCTTAAACAACAAAC^^ 
GATTCCAAGATGAAGATGATCAAGGCTCT 

GCCCCATCTCCTCTMCATC!^^ 

TTCTCATTTCCCTGTATCAGTTTT^ 
^AAOATOAAGGAAGCT^ 

TCTATAGTT«5AAGAAAGGTGTG^ 

AACTAATATGAACTATTTCTAAAr^^ 

AGGAGAATCTATITAGTTTATCATCAT^ 
IT, CJ 

AAAAGGCTAAGAAAAATGATTCTCTCTCTCT 
ATCCTGGTTTrAAATATTTTTAOTACA^ 

TTTTACAAAAGCAATTCAAAgATCTAOj 1 rX -r i Vl-l^ TACACTGAOAATTAATACTTTTT 
TCTTTAAAATCCTTAATTGCAAATCrrTAAA 

^TCCAAAAAAAAAAAAAAA^ 

GA CTOC3VC TAAAAACTACCAAGATTATGATTCITATTTTTGGAGAClTAa tr.»» ft ft TftflfiC 
^CCTTO3QAaAGGaOTGCAACA(mTCT^ 

GTGGGTAGaAGGTCTTAGTGAGAACCrACCTGCATGCTCATCCKlAGCT 
AGGCC^AACA^CTGAAGCTACATGGCC^ 
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13522 



13916 



13974 



15081 



15907 



[T,G,A,C] 

TGGGCAAGTCACTTCCTCTTCTATGAAAC 

TGATTTGAAAGCAAATGAGCTCAAACAC^VATGAC^^ 

ACAACAGTGATTCCCACTATTATAATTATTACAGTCT 

ATAATCAATTACCTAAAATGTCCAAAAACAGGAAAAAAA^^ 

TGTAATTTTCTTTTTTCTCTA^ 

GTTTTATAC^TATGATACGAACTTAAAAGGACTGC^ 

TTCTTATTTTTGGAGAGTAAAGAAAATAGG CTGCCTTTGGAGAGGGGTGCAAC^VGTTT CT 
GATCCTCTTACAAACTGCTTGCTGCCCATCAGTGGCT 

CTGCATGCT C^TCCTGAGGTAGGCACTGTGAAGGCGTTAACAGG CTCTGAAG CT ACATGG 

CCCTGGTTTCACnXSAACrCTGTGGTGTCA^ 

[C,G,A,T] 

GTGAATAATCATAGTACTCACCTTAGAGGGC^ 

ATGACATCrTCTGCTTGGlX^TATATGGCAGACAA 

TACAGTCTTACCAAGGAGGAGCTTTCCACAAATAATCAATTACCT^ 

AGGAAAAAAAAATCTCTTCCGATAATTCATGfTGTAA 1 T l ICri ' mT CTCTAGGAGCATT 
GATCT'CAACCTGATGTAAAGCAAGCACTTTAAAAAGTC^ 

AACAGTGATTCCCACTArTATAATTATTACAGTCTTACCMG^ 

AA XCAATTACCTAA AATGTCCAAAAACAGGAAAAAAA 

TAAI ITllTlVrr m^^ 

AAGTCTTATAAAATTTTCCrrcGTAAATGCAAAAC^^ 

TTATC^'l-riVl-l'AATTCAACAAAAAaTATACTA 
lT,C] 

TAGATTTTATAGACTATGAAAAGATAAATTGCCATCTCT^ 

TAATAAAAGAGACTATATATTTGCATAAATATATAGTG^TATATTGCATAAATATAT^ 

TATATGTTTACATTAAAGAATAAAAGGTATAAGAGGGATAAGAAAAA 

AAGACAGGTCAGTTTGAGATTAACGAATATCCCX^^ 

TTGAAGGATAGTTGTGATTCAGGAACACAGAACTT^ 

A37AA TCAATTACCTAA AATGTCCAAAAACAGGAAAAAAAAATtr^ 
TGTAATTTTCrrTTTTCTCTAG 

AAAAGTCT TATAAAA TTTrCCTGGTAAATGCAAAACrm 
TTTTATOkATTTGTTAATTC^CAAA 

TGCTAGATTTTATAGACTATGAAAAGATAAATTGCCATCT^ 
[A,G] 

T^^AAAAGAGA 

AATATATGTl'iAC ATTA AAGAATAAAAGGTATAAGAGGGATAAGARAAAT^ 
GGAAGACMGTCAGTTTGAGATIAACGAATATC^ 

CCTPGAAGGATAGTTGTGATTCAGGAACACAGAACTTTGCAGAATG 
ACCAAAGGAACAGCCTGAGAGGCGTGAGTATGCAGGAAA^ 

AATGGCTGGGCGAGTCTGTTTCTrTGAGT^ 

ATCCAACTAACCTTCZAATTGCCCTCTTG 

AAAATTATCAAGCAGAAAGAGAOTUnACCCTG^ 

GACAAACTCCAACTAC7^AAATTCT7AGAAATGrcCCTA 

AAATTGCTAATGCTATTAGGTTGTATAGATAACAATAGATT^ 
[G,A] 

CTTTAAATATATAAGTTTCTCTOAAAC1TCTGGGAACTTGGA 

AAAGAATGCTTCTAATAATGAAAQCCATCATCTG CX7VTGGAAACAATTTCAGGGTCTTTA 
GAA^GCTAGTTT ATACATAAGCTCCATTCTACAATAAAACI^ATGTTCA^ 1TTTTTCTG 
ATTTTCCrCCTGCTGTAAATTCATT^ 
TTTCTCAAAGCGTTGTCCTCMAOT 

AAACTTCATCCAATGCCITCACCAAAAAGTTACAAA^^ 

ACTTATTCAGAGGGTAATTACAAAACAAACrrTCTT^ 

TTTCCTTCTAAATTGTATCACT^^ 

GGAATCCCTCACCTCCATACTGAGTAGTAGAGCTGGCTG^ 

TTATAACAAAGTCACCCTTTCAAAAACATGTCTTCC^^ 
[A,G) 

AACCCCACACCACCTCAG CTAAA TGGGG C 1 11 Vrrr ATTTAAGTACX^ATAAAGACATAT 

TTTGGATACTAGCAATTTATTTTCCAAATGCTATCTTTO 

CCAAATCTATATCTCTACAAGTT^ 

ACTATGTGTTCTACAAAAGAAACCGAAGTAAAATTTAC^ 
TGTGATTGAGTGGGAAGAGGCGGACCCTACAGATAGAA^ 
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17908 



17884 ^^CTGGAGGGAAAA^ 
GACTAGGCTTTTGGCT^ 
GGGCTt^CCACCGTTTTGTC3^0AAAAGA 

CAAAGTTCACCAACTGACAGTTTCCCAAAGTGACAjGAACCA 

TTATTGTGAGGCCTGGAACCrACCAGAACCCATGAra 
[G,A] 

TTGCATGCACCAAGTTATATTATGTTGACAATTA^^ 
AAACTIt^CTATAAAATGGGTTCACAAATTTTA^ 

CATGC CT AAA CAAAAAGATATTCCTGTTGT AATAAATTTT CTTTCTG T CATGGTGGAGGG 
GGAAGACTCATATCAGTTGCAGATATTGCT CAGAAGTTTCAATTGTGTTATTTTGAAAAA 
CTACATAGCAGAACACQCATQTC^TATACACAAATCCA 

GGA^CAGGGACTCTCATGTATTCTATGTCT 

TCATAAACATTACCTTTAAAGCAGTCITGAAGTATAG<^ 

AAAAGACTAAGATTCAGGAAGGGTAAGAAATATGTTCA^ 
CCAAACT1X3ACAGAACCAGGAATCAA 

AGAACCCATGACGTGGGGAAAACCCMCAGCT^ 

I v> , 1 J 

TTGACAATOATATTATTTCAACCAC^ 

CAAATTTTACCTGTAATGTAACCGAATt^CATAAG 
TGTTGTAATAAATTTTCmCTX^^ 

ATTGCTCAGAAGTTTCAATTGTGTTATTTTGAAAAA 
TATACACAAATCCATGAGCCTGTATGACTCMAT^ 

ATTATA^TTATCACTTCCCTCAATTAAGG 
ATGGCXTTTTACCTTAAGTAACTAATTTCT^ 
CTTTATAGAAGTGAAATTCACAOyUVAAGAGTTGAGG 
AGAATCAAATTTAAATCTXTOATTTOT 

CAAACACATCTTAAAAATTCTCGCT^^ 
IT, C, G] 
JCAO^™ 

GCTX^mGCACAGCMCrTCTTCC^^ 
GGATACCAGACCTAAGTGTTACAGAAGAGATTC^GGG^ 
AACAGGACACTCTGCCCTTGTAAGGGTCTAGCTt^ 
AAGTAAAAAATCTGAGGTAATGGAATGGGCAGATXn"^^ 

21222 * rcTTTCTT ^^ 

TTACATGATGCTTTCTGTGTAATTTCC^ 

CTGAACTATCTCAAACCMAGT^^ 

TTmCTAGGCTCAGCTTCTAA^ 

TAGGATATGOTAATGGAGAACCTGATTTGAGAGTC^ 
[G, T,A] 

GCCTATAATTCCAGC^CITTGGGA^ 
AAGACCAGCCTT3GCCAATATGGT 
A^CCTGGTGACGGGCACCTGTAA 
"GAACCC3GGGACXKX3GAGGTTGCA^ 



20551 



21232 



21353 



ATGGGGAAGCAAATGAATAGAAGT^^ 
CTTTCTCTGTAATTTCCAATAAATAGTTAA1TTCT 

TGAAACX!AGAGTAAAGCATAAATTGTTC^^ 1X3TTTTTTCTAGG 
CTCAGCTTCTAAACrrCAGCTTATm 

TAATGGAGAACCTGATTTGAflAGTCAC 
LG,A,T] 

^^CTTTGGGAGGCCGAGGOSGGT^ 
^CCAAraTGGTGAAACCCCGTCTCTTC^^ 

^GCGGA^ 

GCAAGACTTCXZATCTCCAAAAAAATAAAAAATAAAAGAGTTACC^ 

GAAACCAGAGTAAAGCATAAAITGTTCATTGGCTGCC^ 
TCAGCl^AAACTTCAGOT 

^^ACTOGGGAGGCCGAGG^ 
TGGCCAATATGGTGAAACCCCGTOTCTTCTAAAA^ 

FIGURE 3 



23/30 





WO 02/26947 PCT/USO 1/29960 



21904 



22132 



22369 



22742 



[C,T,A] 

CGGGCACCTItrrAATCCCAGCT^ 

AGGCGGAGGTTGCAGTGAGCCAAGATCGCGCCACTGCACTC^^ 

CAAGACTCCATCTC CAAAAAAATAAAAAATAAAAGAGTTAC CTGACCAATTCTAACTCCA 
CTAAGT CAC CACAGGACCAC CCAAATAATTGGCT CATG C CTTTGTCTTCATTTTCTCATC 
Tt^AAAATTCCAATGGTAATGTTTGTT CTT CCTGAAAT CLACAGAGAGATTAT AACGATAT 

CAATGGTAATGTTTGTTCTTCCTGAAATCACAGAGAGM 
AGAAAACACAATGTGAAATAAAGAGGCrnm'ACTAATG^ 
ATGCTTT GGAA ACCTGAAATCATTAATTTO 
CTrTGAAAGTTTCAGAATGTTCAATGTAGAAAG 
TAGGGGGTGTTGCTTTTCTG<X^CAGAAACXrTCT 
[ C,T,A ] 

GTTTTGOTCTGGCCCACTtKKOTTGCT 

TCCGCTACCAGCCTGGATCCCATGCE 

TTGTCTGAGCGAGCACAGGGTCTGGCCACTGCCC^ 

GACX3GGCAGCTCCAGGCACTGGCACAGGTGTGCTCT 

AAAGCTaCTGCAAGCMCTTCC(nX3GCAGG<^^ 

GATATGGGAGCCTAGGXjGGTGTTGCTTTTCTGGCC^ 

TGCCTTTGCCCAAGTTTTGCTCTGG^ 

ACIOTGCCC^CTTC^ 

GGAGCTTOTGAGGGTTGTCTGAGCXIAGCACAGGff 
CTGGCTGCAGCATGACXX3GCAGCTCCAGG 
[T,C,G] 

CTGTGGCTCGACAAAGCKACTGCAAG 

GCACCX^GGAAGCTTGGAGATGOTUSGAACTC 

TGGCTTGGGGAGCIX:CX3U3GTCTaGGATCC^ 

cct ^CAATGTX3GCCAGCAAG<3^ 

CTTGCTATTTGGCAGGTCCTGAGTTCTTGT^ 

ACACTGGCTGCAGCAIX^CGGGCAGCTCCAQ 
GAGGCTGTGGCTGGACAAAGCTCACTGCAAGC^^ 
GGTGGCACCCAGGAAGCTTGGAGATGCXZAGGAACIXX^ 
ACCCTGGCTT^^ 

TTTACCCACAATGTGGCOUSCAAGGGGTATGTTTC^ 
[T, A,G, Cj 

AGTu-ii\jCrA , l^XXX3U^TCXTOA G 

AGAC^GTGGAGGGTGAGCAAGAOSAAGAAAGGTTTAC^ 
GAGACCCACAGTGGGCAGCTCCTCTTCATAGCCaG^ 
TAGCAAA GAGG AGGCCCTGGAGGTAGAAGCTCC^ 
TGTTCAGCTTTCAGCAC»£3tf3TAGGCACT 

GGGCAGCTCCTCTTCATACXJCAGGGTGTCCCAAC^ 
GGCCCTGGAGGTAGAAGCTCOTCrrc^^ 
AGCACACAGTAGGOVnyyGGCCCTAC^^ 
CATGGTCTCCCAGTCACCTCTCCATCTGC^ 
i-.G] 

CCCACCX:CTCCGTGCCTGACCAAGCTGCrCCre 
ATTGTGGTAGCTCC CAGGGTGGCAGGCTCTGGG 
TGTCCACCTTCTCCCCACXXXXn^CC^^ 
CCAGGTTCCXjGAGCAGGAGAGGCTC^ 
GTTGGGGGTGGCACAGTCGGCTGCCTCAGGG^ 



22882 



c'tc'tctgcaggcagg^ 

ccctagagtcgtctatctcotctct^ 

tccatcttka^gggtccaatgctgcctcca^ 

CCAAGCTGCTCCCCXACCAGTGGGCAACTC^^ 

TGGCAGGCTCTGGGGGGCTCCCAGGGATGGGCTCCAA^ 
[C.T] 

CCTCX!CTGCAGTGGCCATGGTCAAGAATGGCAATGTG 

GGCTCCAGGCCTGGGAGCAGGTCCTGCCTtXSTCAC^ 

CTGCCTCAGGGATGTGGGACACAGGGGACa^ 

TCCTGCTACCACTGCrcCAGACAGCCTGTAGC^ 

ATTCAGTGGACAGCTCAGGAAAATCTTTAC 
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23316 GTGGGACACAGCXX^rcCACCACCATCACn^ 

GCTCC^U^CAGCCTGTAGCTGCCATCACTAGCACTTAAG 
CTCAGGAAAATCTTTACCTCAATTTTTT^^ 

AATTTATGGACTACCAATAAATAGAAAACraTAGAGATTCTA^ 

TCCTGTAG CCCAAGATTTATTT ATAATTTCT CAAGAAT CIXJTATTTTGTTTTGAC^AAAA 
[A,-) 

AAAACTGTCnX^TGTGGGTCXrrrC3W3GA^ 

CTTCTTTGCATTGCAAACACCAAGGCTGTAGTCAA 

GACTITTGCTTCATTTTTCATCATGATACTTG^ 

CTTTCTCCCAACTCAGAACCGTGTTAG^ 

TTTATTTCCATTTCTATACGAGGAA 

23867 TTTCTATACCAGGAAAGrTAAAAATCTTTGGTCAAAATO 
CTTGTGTATTGACAGTTTGTTTCC^GGTOT 
C^CX^CCATTAT ACTTATCCTGGTATCATTCCTGGAA^ 
GACTAAGTTGACAAAGTTTGAATTGAAGAATTCTAACT 
GCATTACAAAGGACAAAATATATAGTTTTCTTAAAAATGA 
tA,G,C] 

TACATTTGACGGTAAACTGAGTTCXn'T CGATAGAATAACX!ACT AACZAGCAATCGATGGTC 
CTGAGCAATTGACTCTTCACCATACAATGATTTC 
TGAATATTTTCAAAAGCTCCCACrTTGTAGAGTTT 
TTTGTAGAAAGTTAGTAGAATGAAAO\ATCTT^ 

AGAATGTGTCTGAGAAACATGGCACTOGTA C 

23954 TGTAATCATTCTCCCTTAAAATCCGGTTAT^ 

ATTCCTCGaAATGGCTAACTTGCATCCTGCrrC^ 
GLAATTtnAACTTXATGCTATTTTCCACTTTAT^ 
l-rL7i _ iAAAAATGAAATAAATTTACTGCCTTAAACT 
TCCATAGAATAACCACTAACAGCAATCGAT^^ 
[A,GI 

TGATTTGGGATGCCTTTAAGGGTATATTTGAATTGAAT^ 

TAGAGTTTAXCATCACTAGTTTrcCCA^3TGG^ 

ATCTTATTTTGTATAATGAGGAATAGAATA^ 

GTAGGAAAAAGTAAACAGTTrATTCTCATCTGCTCAATAAO 

AAATCATCAAAATTTTCMGAAACCTTCC^ 

2 654 8 AGTGCCAGAAATTAGACCAGGAGTTGGTGGTACCAT^ 

CTAAAATTAGAATTCCAAAGTAGAGAAAGATATAAATAAATCAGG 
GTGATTAATGCTATGACAG^GGAAGTGCATAGTGCTAT^ 
TAACXnXn*TCTCACACAGTAAGAAAGTGAACCX7IX^ 
ATCCACTGACAGGTGCXXSTAAGTGTCC^ 
[G, A] 

GGCAAGTAAGAATGGGGTCATTTCCTGTAATTAjCAA^ 

TGGAGTCMTGCACCCAAGGCX^ 

CCATCAGTGTGGGQCCACCTTX^TTAGTA^ 

CCAGAAGTAAGTTATTGACCTTAAGTTAGAACCCACT 

26573 GGTGGTACCATTGTGAATA7\AACATGATCCCnX^^ 

AAAGATATAAATAAATCAGGAAGTATGAAAATAATGTGATTAATG 

TGCATAGTGCTATGAaAGTTQATCAGASM 

GTGAACCCTGAAATGTGAGAGAGAAGAGGCCATGAA^ 

CXTKX3GCAGGAGGAGTAGTATACX3AAAAaCT 

[T,A,G,C] 

TGTAATTACAAQATOTTTCrirrATAACTTAATGAT^ 

CGAGTTCTTCCyvrTAAACGTCAACA^ 

TGGCCTTGGCAAGCTTCCCTTCAGTATGATAAC^ 

AGTAACACATGGCITXn'CACTGCAGCACACTC 

TTAGAACCCACTTCTGCTAAAAAGCCCTGAGTTTTGTCATATO 

27400 TAATex-i-i-i\i-rCAAACAATGCTCTCCA^^ 

GGGCCAGTCTCATACTGATCTTAAATAATCAAACTAATTC^ 
CAATAAATGCOGGAAGTTGGTAACCGTGMGATGGAGAACT 
TTGACATATGAAGATCTGTGGAATCAGAACAGTTTACAA 

ATGATAA?U3ACAGGCACITCAAAAGAGATTC CICGGAGTATCAAAGGATTCATAGT^GGC C 
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27788 



28069 



29269 



29S37 



29726 



[A,G,C] 

TTGGGCCACTCAATGTGACCTTCCGATAATAGAGCATCT 
GACAAAGCTGAAGTGAAG^ 

CATTTTATTAAATATA^^ 

AATAACCAGTCATTTTTTATCACTATTACA 

TAACTGTATCGCCTTTCTTCTTCATTGCCAATTATTACAGTA 

TTGTGCTATC CTATA ATTGTTTCTX3A^ 

^^^^^^^^^^^^^ C^ACTTTTCAAGATAATAAC CLAGTCATTTTTATCACTATTA 
CAITTAGAATTTTAGArX'lUriTCTAAGTAGATTA^ 

CCAATTATTACAGTAATAACAAAGACTT CTTGAGTATCTCTATATAATAGG TGGCAGCAG 

GATTTAGTGGGAAAAATATGT CCCAGGCAGTTGGAGAG CTIX3GGCAAATTATTGAACCTTA 
{G, -J 

TGTATO^ 

CTTTTGTTACTATAATAAATTTCATTTGC CTTAGGAGCATAAATCTTTATAGAGACTCTrTA 
ATATTCCAAAGAATATACATATTAAGAATCTAGGCTTGG CATGGTGGCTTCATGCCTGTAA 
TCCCAGCATTTTGGGAGGCCGAGGCAAGAGGACCACT^^ 
CTTGGGCAAGATAGTGAAACCCCATTGGGCATGGTGGTGCATA(^^ 

GGCaAATTATTGAACCrrrAGTGTATTAGGTAATA 

TTTGACCTATAAAATTCTAACTTTTGTTA^ 

AATCTTTMAGAGACTCTTAATATTCCAA 

ATGGTGGCTCATGCCTGTAATCCCAGCATTTTGGGAGGC 
GCK^GGAGTTCAAGACCAGCTTGGGCa^^ 
fT,G, A] 

TACCTATCATCCCAGCTACTTGGGAGGCTAACGCAGGAGGAT^^ 
TGAGGCTCCTXSCAAGCTATGtfVTTG^ 

CCCATCTTAAAAAAATAGTAATATATTTTTAAAA^ 

GAAAGATGTCAGAGCTCAGTAAGCTGATATArrAGAAAGCCAGAAA^ 

GTCTGGTTTTTOVAAGTAATGGG3VAAOT 

CATA TATGTG TGTGTGTATATATAAAAAAAAATATATATATATATATATATATATAATTA 
CCT(Ai-x-i -i-iCC AGAA^ 

CAGATCTGTGACTAGlXm^CACATAACAAAATAAA^ 

AATAATGTAAATTGGTGGGAGACAGTGTTTTATAAAGGGAA 
[C,G] 

ATATGTGATGTGAATC^ 

CACAGTCTTTACTAGATGATCTTTCATTC^ 

ACCATAATC^CCCATTGCCX^GCCCACAAGCT 

GATCATCTCTCAAAGGAOTATGCAGTCATCTAATAGA^ 

TTO^AGAArCTACTCCCCAGAAAGAACAAACAT ^ 

TTTATAAAGGGAAGAGGAGAGAGAGGCAGGCAGAT^ 
CCTATCCAGG^i-x xaTTT TCCTTAAOT 
TGCTACTAAATGATTTTTCCGATTCe^ 
ACAAGCTAGAAGTCAACCGCATTTACCACATTT^ 
ATCTAATAGACTTTACCACATCCA3T 
[C # AJ 

AACATGTTTTTTAAAAATGTAAATGAGACTACAT^ 
ATTCCCATATCACTTCAATAAAATTTAAGCACTT^ 
ATCTAGTCCCTGCI-rACCTerca^GCT 
CTGCAGCC ^'TCCAAC^^ 

CCTCACACTG Acccrrcrrm 

AAGTCAACCGCATTTACXIACATTTGATCATCriX^ 
ACTTTACCACATCXIATTCT^ 

TTTTAAAAATGTAAATGAGACTACATTATTCTCTGGCTTAATTATC 
ATCACTTCAATAAAATTTAAGCACTTT^^ 

CCTGCTTACCTCTCCAAGCTCACCCCCAACCATTCTn 
[T,G,C] 

ATCCAACCCAAGACXrrTGGGATTTTT^ 

GACCCTCTTTTACTATGTCTTAGCCCAAATGOGTTArCAA 

AGTACTCTATTCCffTTACCCTAT^ 

ATTTATCTATTTGTTTGCra 

GAGCAGGCACCTTAGGTCCTGTTCATCACTTTAT^ 
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30496 



30695 



30752 



30849 



30900 



30904 



AACTATAACTGGAGCCAAGTATTCT CAGGATTGTATTT CTCTTATATTAGCCTAAATGCA 

ATTAATCTAGCTC^TATACTTTGGGCAGCTTATATATATTCT 

CCAGGTATAAAAATCCACATCAATGGACTGTTAGTTT^ 

T AATGAAAAGAAATGT CAGAAGATTTATTAT CCATGAGAAGTAC CGCT CTGCAGCAAGAG 
AGTACGACATTGCTGTTGTGCAGGTCTCrrTCCAGA^ 
[C,T,A] 

GATTTGTTTGCCAGAAGCCTCTGCATCCTTCCA^ 

ATTTGGAGCACTTTACTATGGTGGTGGGTATOTCAGGA 

TGTCTAAGGCAATGTC^TTTCATCTCCATCAA 

TCTGGTTGGATTAGTTAGGGTTCTTACTTTGTGTGAC^^ 

GTGCAGAATAAAAAACAAAGAAACAAAAACTTCCACAAATT^ 

AAGATTTATTATCCATGACAAGTACCGCTCTC 
GCAGGTCTCTTCCAGAGTC^CCTTTTCGGATGACAT 
CTCTGCATCCTTCCAACCAAATTTCACTGTCCA 
TGGTGGTG<MTATCTCAGGATAGCTAACAGAGCGCTAAGCC^ 
TTCATCTCCATCAATATTATCCTGACAGCCATTTCCACAC^ 
[A,C,G] 

GTTCTTACTTTGTGTGACAGAAATTCAATTCA 

GAAACAAAAACTTCCAGAAATTTGGCTCAT^ 

TTTCACT TCAGACAC^GGGGTTTATAT^ 

TTTTTTGCCCXrrTCTTTTCr^^ 

TGTCAGAGGTCGCTAAC^CaViXrrCA^ 

TGTGCAGGTCTCTTC»GAGTCACCTT^ 
AGCCTCTGCATCCITCCAACCAAATTTGACTC 
CTATGGTGGTGGGTATCTCAGGATAGCTAAC^GAG 
GATTTCATCrrCCAT^ 

TAGGG ' I ^CTTACTTTGTG^ 
[T,G,C] 

AAAGAAACAAAAACTTCCACAAATTTGGCTCATGTAAT^ 

AAGTTTCTVCTTCAGACACAGGGgrTTA^ 

GAATTTTTTGCCCCTTCTTTTCTC^ 

TAGTGTC^U3AGGTGGCTAACAACACerCAACAC7VTC^ 

AGAAAGGAATATTTATTTCTTTTCTT^ 

ATCACAGGATTTGGAGCACTT^ 

CTAAGCCCTGTCTAAGGCAATGTGATTTCATCTCCA 
CCACACAGTCTXXTTTGGATTAGTTA^ 

ATTAACX1AGTGCAGAATAAAAAACAAAGAAACAAAA 

A^-i"i\iGAAGTCAAAAAAGT6TAGTAAGTTTCACTTCAGAC^ 
[A,T] 

TCTGGCTCTGTGTCrCTGAATTTG^ 

TTCaGACSGGATGCTAGCTTCACCTACm^ 

TCCTCAACAAAGAAAAAATACATAGAAAGGAATA^ 

AC ATTAATT TCTATTGTTCCAGCTGTGTCIAGGAG 

ATATTCTTTATGCCTATGTAGCAAAATTTG 

AACAGAGCX3CTAAGCCCTGTCTAAGGCAATGTGATTT 

CAGCCATTTCXZACACAQTCTtSGTTGGATTAGTTAGGG^ 

TCAATTCACATTAACCAGTGCAGAATA^ 

GCTCATGTAATTTGGAAGTCAAAAAAGTGTAGTAAGTTTC^ 
TATGATGTCATCTGGCTCTGTGTCTCrG^ 
[G, A] 

TTGGCTTCATTCAGAGGGATGCTAGCTTCACCT 

AACACAXCATCCTCAACAAAGAAAAAATACATAGAA^ 

CCAGAATTCACATTAA1TTCTATTGTTCCAGCTGTCT 

CTAACTCAAATATTCTTTATGCXrrATGTAGCAAA^ 

TTTAAGTGTGATGGTGAATAAGAATAGTGTAGAGATAAATT^ 

GAGCGCTAAGCCCrrGTCTAAGGCAATGTGATT^ 

CA^-iri,tXZACACAGTCTGgl'ltjGATTA 

TTa ^TTAACCAGTGCAG^ 

ATGTAATTTGGAAGTCAAAAAAGTGTAGTAAGTTT^ 
ATGTCATCTGGCTCK3TGTCTCTGAATTTGA 



FIGURE 3 



27/30 





WO 02/26947 PCI7US0 1/29960 



31664 



32014 



32197 



33074 



33505 



[G,T] 

CTTCATT GAi3 AGGGATGCTAGCTTCAC 

CATCATC CT CAACAAAGAAAAAATACATAGAAAGGAATATTT ATTTOTTTTC^ CAG 
AATTCACATT AATT TCTATTCnTCCA^ 
CTCAAATATTCTTTATGCCTATGTAGCAAAATTT^ 
AGTGTGATGGTGAATAAGAATAGTGTAGAGATAAATTGTCAA^ 

TGAQAAAATGCAAAAGGGCTTTCrGAGAATGACT 
TTTATTTACATAC AAGA AATTATAAAGAATAAGCTTTTGATO 

AACT AGGAATAACCTTTCZACTCACATAGGCAGGAAT CX3GTTTTAGGGTCTCTAGATTTTT 

TCCAGATGTCCCATGTGGTTTTGTTTTATCTTATACAGA^ 

TTAAGGTTGTATTACCAATCACAGAAAATATTACCTATGGT^ 
CT,C] 

AGTGCTGCTGTAAGCCTGACACCT 
AAGATAG<^3CTTGCATTCTCrrGCTTCA^ 
C3CTIt3GTG<3GAAAAATCACTTTCAGGAGTTT^ 
ATGCGCTGTGAGTGGCAACTUCSAATCTGACACr^ 

GGCCTCTCTTGG TGAGTGACCCAC71GGTG CTTTTAATCTATTAAATAGATTAAATTAACC 

GATTTT CTCT GAAGATAGGGCT^^ 
AATCCCCTTTGG CTTGGTGGGAAA^ 

TACCTGTCATAATG CGCTGTGAGTGGCAACAGAATCTGLACACTTATA^ CACTCCACCC 
TACTTGAACACGGCCTCTCTTGGTGAGTGAC 

TTAAATTAACCTATOVTTCTTAATCTGTTAAGTACATTAAT 
[T,C,GJ 

TTACTC ^CCAAG AGAGGCTATATTCAAGTCTX^AAAG 
ATTGAAATTGTACAAAGTATATTCTCTGATC^ 

AGAAAGATAACATAAAAATCCCCAAATGCTTACCAATTAAAAAAJCAT^ 

GAATATCTCXaAAQAAATTTGTAAAAACAAATAGftACTAAATGAA 

TATATGCCAGATGC1X3CTAAAATAGTGTAGAA 

TTGAACACGGCCTCTCTTGGTGA 

AATTAACCTATCATTCTTAATCTGTTAACT 

ACTC^rCAAGAGAGGCTATATTCAAGTCTGTAAA^ 

TGAAATTGTACAAAOTATATTCTCTGATCATAATGG 

AAAGATAACATAAAAATCCCCAAATGCTTACCAATTAAAAAAC^ 
[A, G] 

TATCT CGAAGAAATTTGTAAAAACAAATAGAACTAAATGAAAACAA 

ATGCCAGATGCTGCTAAAATAGTGTAGAAAGGGAAAT^ 

GAAAGA!TATCAAATCAATAATTAAGTTCTCACTTCA^ 

ACCTAAAACAAACATAAGGAAGOAAAIAATAAGAATAA^ 

AAATAAACTATAGAAAATTGATAAATAAAAAGCTGACT 

TCGGCTCACTGCAAACrcCGCCTCC^ 

OTAGC TGGGACTGCAGGCGCC^ 

AGAAGGGGTTTCACCXnxrrTAGCCA^ 

GCCTO3G CCT(XXaAAGTGCTGGGAT^ 

TGl a x-x"i"X'AATTGATGATATAGTAGGCAATATAAATG T ^ 

A^ATATATATAAACCAATTGTATrCAAATAACAGAA^ 

ATTTCTGAGTTACACACTTAAATCTTCCXaAGCACT^ 

CTTCAGAAATAAATCT^^ 

AATGTACTTAGTTTTTTTTTTA^ 

ATAATGTGTTGAAGTATAGTATATGTACACTGTGAGTGTTA 

AAATCTTGGAAATCGTCTTCTAAAGA^ 
GTTTTTTTTTTAATTGATGTATAAAATTGCATCT 
GAAGTATAGTATATGTACACTGTGAGTGTTAAATCTAGTTAACT 
TACATAATTATCATTTTTGTGGCAAGtAAO^CTTAATATCT 
GAATACX^TATATCAAO^AGGCAACCAGAAGC^^ 
lC,T,A] 

GGGAGATGCTGGTCAACAAATTCATATTTGCAGTTA 
TCATCCATCATGGTGACTATAGCTGATGA 

ATGTCTAACAAATAATCACAAACAGTrAAAACA 
TTCATOAGTCAGACGTT^^ 

CTGTAATCAAGGTGTCAGCTGGG<nriX3TGGC^ 
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33551 



33801 



34648 



34754 



34867 



35013 



TTCAAATGTACTTAGTTTTTTTTTTAA 
CAACATAATGTGTTGAAGTATAGTATATGTACACTGT^ 

GAAQCGTCTTATTTTACATAATTATCATTTTTGTGG CAAGAA.CACTITAATATCTACTCTT 

GTAGCGri'rCTCAAGAATACGATATATCRACAGTAGGCAACCAGAAG 

CAGGGGAAGGAGTTAGGGAGATGCTGGTCAACAAATTCATATTTGCAGTT 

tA,Tj 

GTTCAAGAGATCTCTCATCCATCATGGTGACT 

TTAGTTTTTOATAAATGTGTAACAAATAATCACAAA CACTCATTTATT 
TTTATCTCACTGTTTTCATGAGTCAGACGT^ 
AGGCTCTCACCAAACTCTAATCAAGGTGTC 
TTTGAAGGTCTCCTCAAGGTTTGCTGGCAGAATTCCTT^ 

AGTTAGGGA^TGCTGGTCAACAAATTra 
ATCTCTCATCCATCATGGTGACTATAGCTGAT^ 
TATAAATG TGTAACAAATAATCACAAACAGTTAAAACAGCACT 
CTGTTTTCATQAGTC3U^CGTTaWGACA 
CCAAACnTTAATCAAGGTGTCAGCTGGGGTTGTGGCC^ 
[C,A,G,T] > 
TCCTCAAGGTTTGCroGCAGAATTCCTTT^ 
TGCTTTAACTCTTTAGGAAAGTGTCT^ 
TCAGCTGATTAGGTCAGGCCCACCrTTGAlTVATCTC 
TCATTAGAGGTCTTAATCGCATCTCTAAAATTrc 
AATCATGAGAATGGC^TCCCTCATATTCACAGATCCT^ 

TATATGTATATTTCZACATATATCTTATATATGTGSAAAGCTCATCATAA 
AAAATAAATGTACATAGTATTATAGGCATT^ 
C^TGCAGAGTTTCTQGGAACAArCTGGAACCCACA^ 
GGCCTTCCTGAAATATATAAGCTt^TTATTTT^ 
TCCAAArGGCTTTCTTGCTTTGAGAAG 
[T,C,G3 

TTAGATGTGCCAGATTTAACCAGAAATTC^ CTTATATTAACTTCA 

TTCTGTATTGACAATTTTACXZATGAAAAAAATATTA 

GCCAAAGATGCTGATTGTAAATACTAGAAT^ 

AATCATCTCCGAGAAGCCAGAGTGAAAATCATAAGT^ 

GTGTATGGCAATGATATAAAACCrroGAATGTTCTt^ 

QAAAACCATCTAGGCATGCAGAGTTTOTGGGAACAAT 
TACAAAAGATAA?IR)GGCCTTCCTGAAAT7ATATAAGCTGATT 
ACCAGGAAAAAGAATCXMATGGCTTTCTTGCTT^^ 
GGAC AATAATTATCGTTAGATGTGCCAGAT^ 
CTTAXATTAACTTCATTCTGTAT^ 
[G,T] 

CTGW^rTCACTCTAGCCAAAGATGCTGAT^ 

AAG(XK3AATCXXMAATGATCTCa3AGAAGCC^ 

GCAASCAACeACMSGTCPlT^ 

TCGAAGGAATTTATGATGCCTGCAGGGT^^ 

AAAAATTIXnATCTGGCTTAGAATATATTA^ 

A^TTTTACCACSGAAAAAGAATCCAAATGGCriTTC 
TGTGATTGGACAATAATTATCGTTAGATGTGCCA^ 
GAAACTGCriATATTT^CTTCATl'C'lXJlA 
AAAGTCTT^ 

TTTTCCTTAAGGGGAATCCCAAAATQATCnXX^ 
[T,C] 

GATGTCTGCAAGCAACCACAGGTGTATGGCAATGATATAAA^ 

GGATATATGGAAGGAATTTATGATGCCTGCAGGCT 

CTAACTCAAAAArrTGTATCTGGCTTAGAATATATT 

TTTAAATGTTTTATTAAGATAAATGAAGTAAGAGTT^^ 

GTATTGACAATTTTACCATGAAAAAAATATTAG^^ 

AAGATGCTGATTGTAAATACTAGAATAACTCTATT^ 

^^^CGAGAAGCCAG^^ 

ATGGCAATGATATAAAACCTXSGAATGTTCTGTGCCGGATATA^ 
CCTGCAGGGTAAGTTGGAGGGATTTTTTTATATTACTAACT 
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[C,T] 

AGAATATATTATATGTTCTTTACATAAGGACAAAAC^ 

AGTTACAAATGCAAATTT CACAGCACAAAATACTTTTAAATGTTTTATTAAGATAAA 
AGTAAGAGTTTCTCTGATGCTATCAAACAAACAAA^ 
AAAGATTAATAAAGCAGTTTATTTTCTCAAGCGGCTCAC^ 
TAAACAGAGAAGTATAAAGTGATGTTATGAATAATATAA 

35225 G^CGGATATATGGAAGGAATTTATGATGCCTrGCAGGCT 

TTACTAACTCAAAAATTTGTATCTGGCTTAGAATATATTATATO 
AAAACATAGATATCATGTCAGCTCAAAAAAGTTACAA^ 

ACTTTTAAATGTTTTATTAAGATAAATG^ CAAACAAA 

CAAAATTAGAATTTCTTAACCAGAAATCCAAAGACT 

[C, A, G, T] 

GGCTCACATTCAAGAAAGAAAAT 
AATATAATGAAAAGCAAATATTTTTCTTGAAGGAA^ 
GATGAGACXJTAAJVT AAGG CCTTGAAGAATAAATAACATCC^ 
TGTTATAGAAAAGACAAAAAGCATAGCCAAAATTAT^ 
TCTGAGGGAACTCCAAGTAATTGGTTGGGTCTCAGC^ 

35517 TTC TCAAGCGGCTCACATTCAAGAAAGAAAATAATCA 
GTTATGAATAATATAATGAAAAGCAAATATTTTT^ 
TATC^GAGAGATCAGACGTAAATAAGGCCT^^ 

AGAAAATAATGTTATAGAAAAGACAAAAAGCATAGCCAAAATTATGA^ 
CAATTCATATCTGAGGGAACTCCAAGTAATTGGTTG^ 
fA,C,T,G] 

GAGAAACAAGTAGATAACCATGAGAAGGTCGATTTAGGCCATGT^ 

TCC OZAGTG(X:crrCATCTGCCTTCTA 

CTGGAGACACTTGOTTTTTAACATGAGATACTT^^ 

GGAAATGATGGAATGGTATTGATATCRGGTGGCAGAAA^ 

CTGTACCACATGTGCGACCTCTATCAGA 

36885 TAAGAAAACTAGAGGATAAGCTOVGGAGATCCAAC^CCAA 

CATAAAACGCGAGTGTACAATATAAAAAAAAATAAAGAATGCT 
CATGCATCCTATTGAAGAAAAGtTTCCAAGTAG 
TTCCAAGACATACCATCATAAAGGGTCAGAAGCCAGGGAT 
TTTGAAGG^GAACCATCAGAACTACATAGAACTCCT 
[C,G] 

GATGGTGGAAAACACATTTCAAATTTCAAAGGGAAG 

ATGCTAACTIAAATATCAACTGTGAGGGTGGAATT^ 

AAAAAATOTACTI^CrniATACCCTACTTCTTAGGAAA^ 

AATGAGGGAATAAATCAAGAAAGTGGAAGACGTAAGACCTCAA^ 

AAAGAGTGGTATCAGATAATCCCAACACCATAG 

3 8 52 7 AAGAAAGCTGTATCCAGGTCATATATGCA^^ 

AACAAAGTGATATTAAATTACTGGATCTAGTAACA^ 
GAATCACTTTAATCftACCAATAATCCTl^ 
TAATT^ATCTGGCTTGAAAAATAOSGT^ 
AAfflACCAGTXSAGACTCAATACATATTTTTGAAGATA^ 
[G, A] 

TTGTCAAGGGTCTCCTTTTAACTGAGAAACTTTT^ 

CTTGTATAATTCCCTACATTTCTCTOGAGCTCACAAA^ 

TGAATGAGATTTTCCAAAGTACCTTTCC^ 

CATTTGAGAGAGACtZAATAAAAGAAAGTC^ATGTAGGAA^ 

AAGCCAGAGATCTTCTAACTTTTTTTAGTTAT^ 
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SEQUENCE LISTING 

<1I0> PE CORPORATION (NY) 

<120> ISOLATED HUMAN PROTEASE PROTEINS, 

NUCLEIC ACID MOLECULES ENCODING HUMAN PROTEASE PROTEINS, AND 
USES THEREOF 

<130> CL000862PCT 

<140> TO BE ASSIGNED 
<141> 2001-09-27 

<140> 60/235,557 
<141> 2000-09-27 

<140> 09/734,675 
<141> 2000-12-13 

<160> 4 

<170> FastSEQ for Windows Version 4.0 

<210> 1 
<211> 1225 
<212> DNA 
<213> Human 

<400> 1 

cgcccttatg ctgaagccat ggatgafctgc cgttctcatt gtgttgtccc tgacagtggt 60 
ggcagtgacc ataggtctcc tggttcactt ccfcagtattt gaccaaaaaa aggagtacta 120 
fccatggctcc tttaaaattt tagatccaca aatcaatttc aatttcggac aaagcaacac 180 
atatcaactt aaggacttac gagagacgac cgaaaatttg gtggatgaga tatttataga 24 0 
ttcagcctgg aagaaaaatt atatcaagaa ccaagtagtc agactgactc cagaggaaga 300 
tggtgtgaaa gtagatgtca ttatggtgtt ccagttcccc tctactgaac aaagggcagt 360 
aagagagaag aaaatccaaa gcatcttaaa tcagaagata aggaatttaa gagccttgcc 420 
aataaatgcc tcatcagttc aagttaatgc aatgagctca tcaacagggg agttaactgt 480 
ccaagcaagt tgtggtaaac gagttgfctcc attaaacgtc aacagaatag catctggagt 54 0 
cattgcaccc aaggcggcct ggccttggca agctfcccctt cagtatgata acatccatca 600 
SJtgtggggcc accttgatta gtaacacatg gcfctgtcact gcagcacact gcttccagaa 660 
gtataaaaat ccacatcaat ggactgttag ttttggaaca aaaatcaacc ctcccttaat 720 
gaaaagaaat gtcagaagat ttattatcca tgagaagtac cgctctgcag caagagagta 78 0 
cgacattgct gttgtgcagg tctcttccag agtcaccttt fccggatgaca tacgccggat 84 0 
ttgtttgcca gaagcctctg catccttcca accaaatttg actgtccaca tcacaggatt 900 
tggagcactt tactatggtg gggaatccca aaatgatctc cgagaagcca gagtgaaaat 960 
cataagtgac gatgtctgca agcaaccaca ggtgfcatggc aatgatataa aacctggaat 1020 
gttctgtgcc ggatatatgg aaggaattta tgatgcctgc aggggtgatt ctgggggacc 10 80 
tttagtcaca agggatctga aagatacgtg gtatctcatt ggaattgtaa gctggggaga 1140 
taactgtggt caaaaggaca agcctggagt ctacacacaa gtgacttatt accgaaactg 1200 
gattgcttca aaaacaggca tctaa * ~ 1225 

<210> 2 
<211> 405 
<212> PRT 
<213> Human 

<400> 2 

Met Leu Lys Pro Trp Met lie Ala Val Leu He Val Leu Ser Leu Thr 

1 5 10 is 

Val Val Ala Val Thr He Gly Leu Leu Val His Phe Leu Val Phe Asp 



1 
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20 25 30 

Gin Lys Lys Glu Tyr Tyr Hie Gly Ser Phe Lys He Leu Asp Pro Gin 

35 40 45 

He Asn Phe Asn Phe Gly Gin Ser Asn Thr Tyr Gin Leu Lys Asp Leu 

SO 55 60 

Arg Glu Thr Thr Glu Asn Leu Val Asp Glu He Phe He Asp Ser Ala 

£L T 70 75 80 

Trp Lys Lys Asn Tyr He Lys Asn Gin Val Val Arg Leu Thr Pro Glu 

85 90 95 

Glu Asp Gly Val Lys Val Asp Val He Met Val Phe Gin Phe Pro Ser 

100 105 no 

Thr Glu Gin Arg Ala Val Arg Glu Lys Lys He Gin Ser He Leu Asn 

115 120 125 

Gin Lys He Arg Asn Leu Arg Ala Leu Pro He Asn Ala Ser Ser Val 

130 135 140 

Gin Val Asn Ala Met Ser Ser Ser Thr Gly Glu Leu Thr Val Gin Ala 
145 150 155 i 60 

Ser Cys Gly Lys Arg Val Val Pro Leu Asn Val Asn Arg He Ala Ser 

165 170 175 

Gly Val He Ala Pro Lys Ala Ala Trp Pro Trp Gin Ala Ser Leu Gin 

180 185 190 

Tyr Asp Asn He His Gin Cys Gly Ala Thr Leu He Ser Asn Thr Trp 

195 200 205 

Leu Val Thr Ala Ala His Cys Phe Gin Lys Tyr Lys Asn Pro His Gin 

210 215 220 

Trp Thr Val Ser Phe Gly Thr Lys He Asn Pro Pro Leu Met Lys Arq 
225 23 0 235 

Asn Val Arg Arg Phe He He His Glu Lys Tyr Arg Ser Ala Ala ^rg 

245 250 255 

Glu Tyr Asp He Ala Val Val Gin Val Ser Ser Arg Val Thr Phe Ser 

260 265 270 

Asp Asp He Arg Arg He Cys Leu Pro Glu Ala Ser Ala Ser Phe Gin 

275 280 285 

Pro Asn Leu Thr Val His He Thr Gly Phe Gly Ala Leu Tyr Tyr Gly 

290 295 300 

Gly Glu Ser Gin Asn Asp Leu Arg Glu Ala Arg Val Lys He He Ser 
305 310 315 320 

Asp Asp Val Cys Lys Gin Pro Gin Val Tyr Gly Asn Asp He Lys Pro 

325 330 335 

Gly Met Phe Cys Ala Gly Tyr Met Glu Gly He Tyr Asp Ala Cys Arg 

340 345 350 

Gly Asp Ser Gly Gly Pro Leu Val Thr Arg Asp Leu Lys Asp Thr Trp 

355 360 365 

Tyr Leu He Gly He Val Ser Trp Gly Asp Asn Cys Gly Gin Lys Asp 

370 375 380 

Lys Pro Gly Val Tyr Thr Gin Val Thr Tyr Tyr Arg Asn Trp He Ala 

I ^ 390 3*5 400 

Ser Lys Thr Gly He 

405 



<210> 3 
<211> 38844 
<212> DNA 
<213> Human 

<400> 3 

ttatattcat aaaagtaggc agtaagttga 
agctttaacc tgtggcttct gtagcttttg 
ctaaatgttt cctcaaaagg agaaacactc 
ccattttccc tcagatgctc acagcttctt 



agatttattc atataggatt tagtagctgc 60 

taatctggca gtgcgcatct gctatattat 1^0 

taacaactta tcaccctagt ctgctggcca 180 

ccgtgggatt tgaagatatg acttccatga 240 
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cacttgatca gtatgtcaat gggtattgaa ccactcttca gctctgatcc cacgqt t , ao ,nn 
tccaSSc 22?"* t9tC "" t9 ^agatg tgattltttt atcStttc 360 
atttc^tga gtSaact? Ctaata ^ a aatagattga aagcttataa 420 

? gttttaactt ttctcctttg gtcttttttt cttttcaaat gacttgaaaa 4S0 
ataaatf^ 3 39a " Ctat 9 agaaaatgaa gagttgaaca aattgaatat gtatglgtga "o 
agg^gag aataaatS ITT™ tattaaataa "tgaacgaa atcaatlgag loo 
afttctattf t^ aa 9 tgtccta 9 aa 3taagaagac ctgagtttga gataactag? 660 
^of! tact 99 a 3 aa attacttaat catcactgga cttcattttt ctcatatgga 720 

aagtaattca atcacactaa acaatcttta aggtctcctt cacttataaa tgtatgtttt 780 
c:agc C :S a aS t taa c tcccatggga cttctgtttg SgtSatt III 

atta^?^ ^ 9 tatcacagga cctgctgcct ttccgcagcc agttctctag 900 

atcagtcggt gcacacatgg tcaatattta ctcaatagaa ticaggtttc 960 
aact 3 ^! t9a " attct tgattaattt tattacttat gccaaaacta ttatcttctt 1020 
tSaaaoca L^T 4 ttatcctggc atttatatat aaaaaacSt loll 

tgtaagaccg ggtgcagtgg ctcatgcctg taatcccagc actttgggag gccqaootaa n! 

ac?I~ C T a99tCa ^ a 9 atggagacca tcctggctaa caccagaaa cccSttct ^200 
aoa™ ? CaaMaatta 9<~g3gcgtg gtggtggacg cctttagtcc cagrtattca ^260 
cacS 9 ? 9 9Ca39a9aat ggc9tgaacc tgggaggcag agcttgLgt ga^cagagat llto 
cacaccactg cactccagcc tggcagcctg gatgacacag cgagactccg tctcaaaaaa lllo 
^a 3 ^ 33 ^ aaaa9aaaaa aactgtttta tagtcaaaa^ aaaaactttc tataaatcaa lllo 
atafcata-^ 9 aa 9 aaa ata tgaaaaatat cotctgtttc caaaaaaatt taggctatca Itol 
caaattttao a ' aaa9agat aaactct 9 at aaattggata aataaaattc actataatag llel 

=E2Z 2SS 3SSS Stat 9 ! 9 =£= s™ iS 

™ ™ 9 = S5SS c~ =sS 

SS Si — =2 =s= == 

=b= EH ?~ s = ™" = s= 

aStttaoot toatot 39 ? 3 aa9Ca " Ca 9 acaagtccag aaagtaagat attctaaagg 2100 
a S:^ tgatctcttc aacagtcaat gtcattaaaa actaaaaaag aagcaggact 2160 

tatSc? a g 3 : aaa ? a9 : a a aa : aa ? 9 tf srr- 8 t9cact9ca * »s~ss ss 

atggactgoQ tattaa^l ^catgtgtaa ttataatgaa accatggagg gaacttgaag 2280 
ta?cafc~f?? :^ a9 ? tatggcagaa atatcattaa ttttttagga gtgttaagag 2340 
tatcatggtt atgttggata tatcctaatt otctataafca at-n^i-^ s ° 3 3 ^■ 340 
oatcrttttat t-i-<-=.^=.^ = , ._ t gtccacaata atgatttggt aaaaagtcac 2400 

aattttcaac Lttoatl aata t a 9«g cagaaaaaat aaatgagcca aatacagtaa 2460 
tagtaatttt t a taatgtgat atatatatgg atgttcaatt atactattct 2520 

aa^tgag ataSgatt £E£"* aatacttaaa aataaaagat aaaagataaa 2^80 
agagataacf taaaatcact ttgtaaactc taaaaggata gacagataaa 2640 

3 ™f a f aagtgctgga gaaaggagga atggtccctt ttcaagcatg tataccacct 27on 
tggaccatgc tgctaagaga aaccattcct gaccaccaca aagagLcac caaatacctc lllo 
atet^ 3933 a9Ca 9* a 9 ca aoattaggat tcccagatcc tgataltttt ttttSaSc lltl 
^tgaaSg tctaa^aT T" cLtttgcag ggaaa'ggtag 2 2 8 8 80 
gagtaaccag toggccSS ll^^ll « ct 9 caa 9 cat tgctatctga 2940 

tgctgaagcc IZ^ZZatZ 2f f^ 9 3 aca g t 999a ttbggcaccc gaagcagaaa 3000 
ccataggfct ccgStcac tSa^ 3 ^^f 69110 Cct 9 aca g fc 3 gtggcaitga 3060 

22££ s 32 S2=S =3= 52! 

g ~ S taa - i I 52 s ™ -™ -~ a -OS 

gag« 33 ?a 3 aa9tagaCa f a ttt=taatt tccttagagc tctcaaccca gaatLtttt 3360 

aaa 3 :aSc: t^t^l t^ttaaaa t^^T taattataat 3«0 

ttqactcttt £f 3 ? ? tatattaaaa tgaaaagcta tgataaatta gttattaaaa 3480 
oacSSta ^ actca ^ 9aa ccatcatttt ctgtcoaaca tttctaaggc aaaagaaaaa 3 540 
attaSttt at^L 9 9aatttcaaa a ^attgaaa acctatalgt atgacacaat 3I00 

J££2£l SSS Sg££ t 3 r t T ta ttca999a " — 

ttacattcag tatotttot. ggga^aatgc ^agc 3 : 3 ' 3 : catttt^ ^aatctcta 3780° 
caatgtgact ctcacatgga totatotaat aaaacaaa h a a ^ caaatctcta 3780 

— — SSSSS :s 
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cttaaagggc ttaagatcat agtcctaggc cttatatgat aaccccagct gtagtttata 3960 
ccattggcaa aagattctca ggtcacttta tttggttgca taaaagtctc tttacaatga 4020 
gagtaaggtt tgttaacagt atggattata tgggtaagta atcaggatgt ccaaaaatgt 4 080 
attacaaggt ccagagattt cccacttaag acatatgcct tcctgatatc cctgtttctt 4140 
tccttggttt gtagtctcga aacccactcc ctcttccctg agccaggctt ctcaaggatt 420 0 
gaggttgttt tgtatttttc ccattctcta tctttaactc tgtatctttc ttactccctc 4260 
tgggccttac tcctcagatt accaaattcc ttaggagtct caactgcttt cctttcttac 4320 
atttcctaat agatttatcc ctgtttcatg ctcgtcttgt cttcaatctc agacagctct 4380 
tctctacact ttcttttcag gtttttctta gtgtgcctgg ctctcttgtt aaaaatcaaa 444 0 
attcacaagg acattcactt atctctactt ccactagagt gtatgatggt acacatttca 4500 
actcagcaag gagcaatgta gcaatgaaat gttcaagctc tacagctaga ctggatttaa 4560 
aacttggaca ggccacctac tagttacaga acaatttact taatgcctct gtgccttaat 4620 
ttccttatct gtaaaatgaa ggtgatacca atcttagaga gctggtgtgg ggattaaatg 4680 
ggctaataca taaaaagtgc acaggacagt gcctgccata ttgtagaaac tcaataaatg 4740 
gcagctatta taattgatat aaaacattaa ctgttatttt ttaaataaaa ctcaattatg 4800 
aagaggctca gggacatatt caagatttat attggcccca ttgtaattga gttctgaaat 4860 
ctttgtccaa accatttagt ttcctatttt tcatttccat tgcagaccaa aaaaaggagt 4920 
actatcatgg ctcctttaaa attttagafcc cacaaatcaa taacaatttc ggacaaagca 4980 
acacatatca acttaaggac ttacgagaga cgaccgaaaa tttggtgagt caggtaaact 504 0 
tctttttatc atagaataat gcaagtggaa gggattttgt ggatcatttc tccatttcta 5100 
aaaacatgat tttcagaccg ccaacattag aatcatcttg cagattgcta ggccccatcc 5160 
cagacctgct taatcagagt atgatgagat gggtaggtgg ggagaggaga gtaagggaat 5220 
ctgcatgtct aacaaatggg tgattctaat aagcctctct ttctaactca gctaccttat 5280 
ttaaaggtaa gagaattgag gccaagatat cctagcccgt ttcttcccca attccaccac 5340 
gtttcccctg tagaaaagcc taatcatacc aaaactagtt tttataagtc cacacacttg 5400 
tttgtaagac cacattttaa gatfcttgagt attttcagaa tttacgttca tcttgtaagt 5460 
atattgataa agacaaaaaa ccagacttat tttgtsagtaa tcaagtcaaa tgctaataat 5520 
tttgttaaag ctaaagtgca agactgctcc caaaaagaaa aaaagcacac tcagttgtat 5580 
aatcattcca ctcagaatgc ccatgaactc tcactcaaaa actaggttca aattaatttt 5640 
tctaacaagg aagcacagaa gcagagactt attttaaaaa gaaagaaatg acaaatgtat 5700 
tggtttgttt taatcaaaga accattttta agacactttc tttcccaaat catctaccat 5760 
tttttcctgt catcatttgc tctttgtcca tagtatacct aatggcatca tatttacaat 5820 
aatattgtag agtttataat ctctattttc agttaacatt aaatcattca caatttctta 5880 
attttgtggt ttcatctttc ccaaccaata attaatgtct acagattgat atagattctg 5940 
cattctttca catgcagagc atcttataaa agagcatttg caatcagttc ttaagttatg 6000 
ctaggatgaa cggggagcct gcaccaatac acccaaatac cttctctact cctccagtcc 6060 
taagtgactc cacataacct cctcgatgca aaaagagaaa actcttaact tgccttagtt 6120 
aaaaagataa acacaccttt gaatgatgga aaatgttaca atttactggg aaattttgaa 6180 
atttgtttca tttatatttt atggccaaca ttactgctac tgttgttgtt gtaagttaac 6240 
taggcaattc tgtctttact gaagtaaacg gacaagaatg caataggfcct taaaagaagt 63 00 
gagagaaatg cagaggtgca fcgttgaacag aaactctatt taaaagtgga gttttaagtt 6360 
tcacctaagc atgtgttcct tcaaaggcta aggctaagtt aagtaaggac acattatcat 6420 
catgggtacc tgcaaggccc ttctctggtt gtcattattt afcttatcctc ctttatcacc 6480 
atagcataag cccttaccct ccccccttgc aggaaatcat tctatgtttc atgtggtatt 6540 
cttttgtttg tattcattct tacaaaaata tgttttgcta ttttgcgtac acttgctttt 6600 
aacttacatt ttgtgttata aatcactttt gtttcatctc tttttactga gaacttttta 6660 
aaagatatat gttactaaat atacctttag tttattgctg ttagctgcta attcatagtg 6720 
tgtatcttcc atatttacct gcctgtcatg ccaagaaatg ccacactaaa cagactccta 6780 
cttaccccct tatagaccta tgcaagtact tctggaagca gaattactag gtcattgaat 6840 
gtacatatac ttaacttgac caattggtgc aggtttgctc ttcaaaatgg ctgactcagt 6900 
gtgcacgccc atctacaatg catgaggatt tctatgtccc cacatctaac caacacttag 6960 
tgtcttagta tgtttaggct actacaacaa aaaataccat aggctgggta tcttaaacaa 7020 
caaacaatta tfctctcatag ttctggaggc tgaagattcc aagatgaaga tgatcaaggc 7080 
tctagcagat gtctggtgag agcctgcttc ctggttcata gaataccatc ttgctgtgtc 7140 
cctcatggca gaagccataa gagaactttc ttttgtaagg acactaatga ctttcatgag 7200 
aactccaccc tcatgaccta actatcctcc aaaggcccca tctcctctat catcggtttg 72 60 
ggagttaagg tctcaaaata taaatttcag gggaacacaa acattcagtc cacagcactt 7320 
ggtattattt ggctttctaa atttgccacc ctaatatgta taaagtagta ttttatttgt 7380 
gatttaattt gcatgtttct aattactaat gagtttgtgc attgttacgt ataattatta 7440 
actttttgga ctttcatttc tataaattgc ctgtacatat tatttgccta tttttctgtt 7500 
aaacttgctt tttcacctta tttgtattgc tttgcagaag ttctttacat tttctggata 7560 
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ttgatagtgt gttggtfcgtg gacactgcgc ttatccattc tgtcttctac taatatggac 7620 
cgtgttgttc tttatgaaac cgaaatctgt aactgaagta atcatttttt cactgttttg 7680 
ccttatgatt gtattttgaa gcttttcttt aagaagtcct tcttcccttc taagacataa 7740 
aaatatttta ctatgttact tattaacctt afcagttttat cttttacatt aggtctccaa 7800 
tacatgtgga atccaccttt ggatgtgtta ggtagattca gttttttaat tcatatagtg 7860 
agccagtttt tgaatataac tagttaaaat atcttggctt ttcctaatat atggtattat 7920 
tattgagttc attgcatgca tttcttggca cctgggtctt gcagaaaagg aaacatgaat 7980 
ctgtctcctc aaattgcttc caatcttttt ggaaagatgt gagtaacaca catggaattg 8040 
aatatcatga catgatataa ttaagggcta aattacatgt tgaggacagt aagtacagaa 8100 
aaacttcaaa accaaacaag ggttcccatg gtcagaaaag gctttatatt attttacctt 8160 
tgtttaaatg agacaggtgt ttttctcctc ccatcccgca ccaggttagc tttagaagaa 8220 
ttacaggaag agtttatgcc tcafccctgag ccacacctgt ttgttgttgc taaatcccaa 8280 
tgaatacaac cagattcttc tctctgtcct atatgggtgc taattagaca accaaggaag 834 0 
aacaggttgc acgtcctgtt cttcctcaca ttgggcttta ctgatttgaa tgcaaattga 8400 
gatgcaaaag taaaaatgag ttcatattta gatattgcta fcaatccgccc ctgttccctg 8460 
agatagtgga gcagacatat ctcatctctc atatcattct tcagagaagg gtccattaat 852 0 
cagacattac tgatgtctga ttactgccgg ctggccatcc tgcaggtgga gaagcatggc 8580 
atccagcaga aactgacagc atgcactttg agggagggaa ggataagcca ggaafcttatg 8640 
ctgaataagc tgcctaagta tacatgttca ataagttcta ggggaagtca caaatactta 8700 
tgaaaggaga aacataacta tgtgcaattg agctttatgt ctcttcatgt gttgcatgtt 8760 
caaaaaatgg tggcattagc atgatccaag ggtggagttt tcagccattt gatgttcaaa 8820 
ggtgaagcag aggacacaaa acccttacta tgcatcctct gtgagtcagc caaaaccagt 8880 
ctggactgct agctagatta acaaagaaaa aaagagaaag aagatacaaa taagcacgat 894 0 
cagaaatgat agaggtaaca ttacaaccaa tcccacagaa atacaaaaga tcgtctgaga 9000 
ctcttatgaa cacttctatg tagataaact agaaaatcta gaggaaatgg gtaaattcct 9060 
ggaaaaacac aatcttccaa gattgaatca gaaagaaatt gaaaccctga acagaccaat 9120 
attgagttca tacttaaatc agtaatttaa aaaacttacc agccaaaagg aaaaaaaaag 9180 
gcccaaacta gatggattca cagccaaatt ctaccagacg tacaagaaat agctaggacc 9240 
aattctagtg aaactattcc aaagaattga gaagagactt cttcttaaat cattctatga 9300 
agtcagcatt accctaacgc caaaacctca caaagacaga atgaaaaaag aaaattacag 9360 
gccaatatcc ctgatgaaca tagatataaa aatcctcaac caaataccag caaaccaaat 9420 
ccagcagcac atcaaaaagt taattttcca aaatcaagta ggctttattt ctgtgatgca 94 8 0 
agactggttc aacatatgta aatcaataaa tgcgatttac cacataaacc gaattaaaaa 954 0 
caaaaatcat acaattagcc aggcatggtg gctcacactt gtaatcccag cactttggga 960 0 
gaccatggtg ggcaaattac ctgaggtcag aagttcgaga ccaacctggc caacatggtg 9660 
aaaccccatc tgtattaaaa atacgaaaat tagccgggca tggtggcagg tgcctgtaat 972 0 
cccagctact cggagggctg aggcaggaga atcacttgaa cccaggaggc agaggttgca 978 0 
gtgagccgag atcgtgccat tgcactccag cctgggtgac agagcaaaaa tccatctcaa 984 0 
aaaaattaaa aatttaagaa aattaaaatc atacaatcat ctcaatatat gtagaaaaag 990 0 
cttttgataa aattaaacat cccttcataa taaaaacact tagactaggc atcgaagaaa 9960 
catacttcaa aataataaga gccatctgtg acaaacccac agccatcatc acactgaatg 10020 
ggcaaaagct ggaggcacta tccttaagaa cagggaaaaa gacaagaatg ttcactctca 10080 
ctactcctat tcaacatagt actagaagtt ctagaaagag caatcgagca ggagaaagaa 10140 
ggaaaatgca tccaaatacg aaaagaggaa gtcaaattat ctctctttac tgacaatatg 102 OO 
attatatgcc tagaaaaccc taaagacttt acaaaaagtt tccaaaactg ataaacaact 10260 
tcagtaaagt ttcaggatac aaaatcaatg tacaaaattc agtagcattt ctaaacaata 10320 
atgtccaagc tgagaaccaa atcaagaaca caatcccatt ttcaatagcg acacacacac 103 80 
acaaatgaaa tacctaggaa tacatctaac caaggaggta aaagatctct ataaggagaa 10440 
taaaaaaaca ctattgaaag aaatcggaga tgacacaaat gaatgcaaaa acattccatg 10500 
ctcatggatt ggaagaatca atattgttaa aatgtcccta ctgcccagag caatctacag 10560 
attcaatgct attcctatca aactaccaac ataattttcc acacaaagtt agaaaaagct 10620 
tttgtaaatt tcatatggta caaaaaaaaa aagccccaat agccaaagga ctcctaataa 10680 
aaaagaacag agccagaggc ctcacattat ctgacttcaa actatacttt aaggctacag 1074 0 
taatcaaaac agaatggcat tggtcaaaaa cagacatata aaccaataga acagaataga 10800 
gaacccagaa ataaagccac acatctacag ccatcagata ttcaataaaa ttaacaaaaa 10860 
taagcaatgg ggagagaact ttctattcaa taaatggtgc tggaatagct agctagtcag 10920 
aagcagaaaa atgaaattgg actcctatca ctaaatacaa aaactaactc aagatgcagt 10980 
aaagaattaa atgt:aagacc acaaacaatt aatacaagaa ccctagaaga aaacctagga 11040 
aatactgttg tagacatcag tcttggcaca gaatttagga ctaagtcctc aaaagcaact 11100 
gcaacaaaaa caaaaattga taagttggac ctaattaaac taaagaactt ctgcacaata 11160 
aaagaaacta tcaacagagt aaacaaacaa cctacagact gggagaaaat atttgcaaac 11220 
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tatgcatctg aaaaggtcta atgtccagaa tctgtaaaga acttaaacaa ctcaacaagc 1128 0 
aaaagaaacc aagtaacgcc attaaaaagt aggcaaagaa catgaacaga tgcttcacaa 11340 
aagaagacat acaacgcagt caagaaacat atgaacaaat gctccacatc actaattatc 11400 
caagtaatgc aaatcaaaac tacagtgaga taatatctca taccagttac aatggctatt 11460 
attaaagatt aaaaaaataa catgctgatg agactgcgga ggaaagagaa tgcttaaata 1152 0 
ctgttggaaa cgtaaatggg ttcagccact gtggaaagca gtttggagac ttctcaaagt 11580 
acttaaaatg gaactactat tcaacctagc aatcctactt actgggtgta tacccaaagg 1164 0 
agtataaact tttttcccag aaagacagct gcactctcac attaattacc acagtattca 11700 
caatagcaaa gatgtggaat caacctagat atccatcaat ggtggattgg acaaagaaac 11760 
tgtgagatat atatgtatat atatctatat ataccatgga atactatgta gccataaaaa 11820 
aggatgaaat catgtccttt gcagcaacat ggatgtaaca ccacaaggaa ggcactttta 11880 
tctcctcttt acaggtaaga gaaccaagct tctgaaatta aggtccatag ctggaaaatg 1194 0 
atggagggga gatttgaagt catctaggca actccacaca tgtgctcttt ccactaaatt 12000 
gttctactgt caggaaggga ctcagctaag acagaagata aaattattaa aatctaaatc 12060 
aattcttctc tcatttcatt ttttaaatcc atgaagatta taaatcctct atgctgtgct 12120 
agctaacttt ttcttgacag atacattagg tatacttatt agagaaaaat attctctttc 12180 
tcatttccct gfcatcagttt ttggtgagga aggcaaaggt aggaggaact gtaatagaga 12240 
aagatgaagg aagctgatgg atatattgac atgtgtatgt acatctagtg tgaacaatct 12300 
atagttggaa gaaaggfcgtg gatgggtatg ctttttgagg gaagtttttg agaaaagaag 12360 
taatatgaac tatttctaaa tttcctgata aagttgtaaa tacagcatag tcttcacagg 12420 
agaatctatt tagtttatca tcatcattca gcaaatacag catgafcgtta ggcactataa 12480 
aaggctaaga aaaatgattc tctctctctc ataaactaat ccaatttaga gatttagaag 12540 
acaacaaatc tggagaggac atgaaccttc taaataatga ccttcccttg ctttgggtat 12600 
cctggtttta aatattttta gtacagcttt aaatagatcc aaatgagata ttttcctctt 12660 
ttacaaaagc aattcaaaga tctaggtttt tgttgtacac tgagaattaa tacttttttc 12720 
tttaaaatcc ttaattgcaa atctttaaat tctataaata ttttgccttg tgatctcaga 12780 
aatataagcc aatttgggat atggatatct aatatattgc tacttgttac acgtgagtag 12840 
tgacagatgt ctgtccattt ctttctgaca ttccacaaag aaacactgaa gaaggaccag 12900 
tgcaatcaaa gaaatgactg atggcatcac aaaatatcac atcccatttg atgatctgat 12960 
tacctttttg tttagggtga tcagaaagtc acagtttcat ggcaccctcc acacccacac 13020 
accttgtatg acactggatc caactgcttt ctccaataga cacagcactt aaagatgtgg 13080 
cagttaggct tgaccccaag aaggccaaaa agccttctgt gagcatcact cagtgctcag 13140 
gttgactaag ctctatccag gcttgagaga atggttcata gctgactfcct tggatccaaa 13200 
aaaaaaaaaa aaacacctag agttttatac agatatgata cgaacttaaa aggactgcac 13260 
taaaaactac caagattatg attcttattt ttggagagta aagaaaatag gctgcctttg 13320 
gagaggggtg caacagtttc tgatcctctt acaaactgct tgctgcccat cagtgggtag 133 80 
gaggtcttag tgagaaccta cctgcatgct catcctgagg taggcactgt gaaggcgtta 13440 
acaggctctg aagctacatg gccctggttt cagtgaactc tgtggtgtca acttgggcaa 13500 
gtcacttcct cttctatgaa acgtgaataa tcatagtact caccttagag ggctgatttg 13560 
aaagcaaatg agctcaaaca caatgacatc tgtgcttggt gcatatatgg cagacaacag 13620 
tgattcccac tattataatt attacagtct taccaaggag gagctttcca caaataatca 13680 
attacctaaa atgtccaaaa acaggaaaaa aaaatctctt ccgataattc atgtgtaatt 13740 
ttcttttttc tctaggagca ttgatctcaa cctgatgtaa agcaagcact ttaaaaagtc 13800 
ttataaaatt ttcctggtaa atgcaaaact ttctgataaa taaattctca cctttttatc 13860 
aatttgttaa ttcaacaaaa atatactaca taccaacagc atgcaaagca ctatgctaga 13920 
ttttatagac tatgaaaaga taaattgcca tctctatgca taaagggttt gccatttaat 13980 
aaaagagact atatatttgc ataaatatat agtgaatata ttgcataaat atataatata 14O40 
tgtttacatt aaagaataaa aggtataaga gggataagaa aaattgagac agagggaaga 14100 
caggtcagtt tgagattaac gaatatcccc aaagaaggta ttatctgaga ttggccttga 14160 
aggatagttg tgattcagga acacagaact tgcagaatga gaaggttgtt acagaccaaa 14220 
ggaacagcct gagaggcgtg agtatgcagg aaaatgaggg ccatgcctga aagtactggt 14280 
ggtgttgaag atggagccag gcaagttggt cacagaggga gaggaccttg aatgtctaac 14340 
attgtggaca gaggctcaaa ggctcaaatt ccctattttt accttgagtt caatccttgt 14400 
ggcaatgaaa cctcagtgaa gctttattta aggctaaaag tgtcttttaa aaatccctct 14460 
tatataatat cctttgcatg ttactcttgt tgtaattagg agaaagcaat aggatctaaa 14520 
gttttttttc acagcatggt tttggtttct ttaattctaa ggagctcacc tggtgttacg 14580 
ttggaaaaaa cagcttttat atctcattta tattccatat gccagtctgc agtgacatat 14640 
ctatctgagg tttacagtgt tagccacaaa acactcccta agtgaataca ttgactgctg 14700 
taaggggagc cagtcaggaa gcacctgcag agaaaagcag gcaacatgta taaacagagt 14760 
taattcagga atgaaagctg aatggctggg cgagtctgtt tgtttgagtt gacagcctct 14820 
ccctcactct ttcattaaat atccaactaa ccttcaattg ccctcttgga acttaatctc 14880 
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agtgtaattt ccagcatgtc aaaattatca agcagaaaga gatactaccc tgaaagaggg 14940 
tcttttgttc aatgctagga gacaaactcc aactacaaaa ttctagaaat gccctaaaga 15000 
gagagatagg atagatttac aaattgctaa tgctattagg ttgtatagat aacaatagat 15060 
ttataacaac ctggcacaca gctttaaata tataagtttc tctgaaactt ctgggaactt 15120 
ggaatgccag aacgttggca aaaagaatgc ttctaataat gaaagccatc atctgccatg 15180 
gaaacaattt cagggtcttt agaaagctag tttatacata agctccattc tacaataaaa 1524 0 
cttatgttca tgttttttct gattttcctc ctgctgtaaa ttcattttat cagaattctt 15300 
tttaccagtc cctctgcccc atttctcaaa gcgttgtcct cagactacct gtatcaccta 15360 
aagattctaa ggcctcctcc gatgtagtaa atgagacttt tctagagaga gagtcctaga 15420 
attttataaa gaaggatcct ttttattatt gtgatcacca aagttacttc tgcctagatt 15480 
cttctcatgt tatttttaca gctcctatct tcccagacaa cctaacaatt caaagataaa 15540 
attggtgctt ggtttagaca ttcatagcag gcacggtgcc agattgatga tgtcatccag 15600 
agtcaaaaac ttcatccaat gccttcacca aaaagttaca aatggccagg aatcaaatgt 15660 
ggttgaactt attcagaggg taattacaaa acaaacttct ttaaataccc aactgctatt 15720 
tgcttttttc cttctaaatt gtatcacttc tctccctgtt ccattttgtt tgccttttta 1578 0 
fctttttggaa tccctcacct ccatactgag tagtagagct ggctgtgggt gatgagagag 15840 
aaattgttat aacaaagtca ccctttcaaa aacatgtctt ccaaaagaat tttgtttcta 15900 
gcagataaac cccacaccac ctcagctaaa tggggctttc tttatttaag taccaataaa 15960 
gacatatttt ggatactagc aatttatttt ccaaatgcta tctttgatct taagtttaag 16020 
gctattacca aatctatatc tctacaagtt ttatacttta ggtcaataaa ttacttgata 16080 
acttattact atgtgttcta caaaagaaac cgaagtaaaa tttacatcac atttaacagg 16140 
gtggttgtgt gattgagtgg gaagaggcgg accctacaga tagaagactt gggtttcagt 16200 
cccagcttac tagtatctgc gtgatgccag ggaaattcac ataatgcctc tgagtcacag 16260 
atttctaaca ggaatgaaga tacfctcttcg cagaattgtc attagagtta aagaagataa 16320 
caaataatgt ggttcctgat gaggtattta tgaattcctg agcatgctaa ggaagttata 163 80 
acttgtttct tgatccctga aacagctttc cctatatttg tgtgtgtgtg tgtgtgtgtg 16440 
tttcagtcat gcaagttggt ttttcttctc attccttgag aatttaggat attttgtgcg 16500 
cacatttggt tcttctgtcc aacatgaact gtagtacctt acccacattg agatgacact 16560 
atttctacca agtgagtgct aggggatact gcaagccgaa tgccaggtgt gagagaccac 16620 
agcatcacaa taccgtggca gtagattaaa gctgtgcata tggactaaaa gcagtggctt 16680 
tgcttctcct accttggtga cataaactga gtaacaaatt tgacctaata ctggaatacc 16740 
acctaattct tttttcctcc ctgatttacc ctagagtcca caattgacaa taatttaaaa 16800 
attttggctc tctcttaaat ccctaatgcc tcctccttac accttacaag caaagacctg* 16860 
cagagctaag acctgtaatg ccaggatgga ggctagagga ccatcagcaa ttaactacca 16920 
aaacttaccc aacattttat atctgtttaa ccttcatagc cttatgagta gcagatcaat 16980 
atctttgttt tacaggttag aaaactgagg ctcaaattga ttcagtaact ttgccaagat 17040 
tgcccagttt gggaaaagta gtatacgctc aaatccagga ctgaggcagg gttttctttg 17100 
tcaccactca aagcctctct gaatatccta tctctgctct gtatctctct gctactcctt 17160 
ctatggtgtt ttagcaagat atcttctact ccagaaacct actctagcac agtagaatta 17220 
cttgggtagg ttttttaaaa atatgagtgc ctaggtcccc tctagaccaa tcgaaaccaa 17280 
aattcttgga gaggatccct ggcatccata aattttttta attcatcaaa tgattctgtt 17340 
gcactgtgaa agctgagatc caccaattta aataatgatg ttagttctgt gaaaaaattt 17400 
ttgattgctt taacatttaa tcaaggatat attcctatta taaaatatat tattaacaca 17460 
tagtttctct cttgttgtgt aacaggtgga tgagatattt atagattcag cctggaagaa 17520 
aaattatatc aagaaccaag tagtcagact gacgtatgta tgtttgggca aaggtggaat 17580 
cacaagactg gagggaaaag gaacaaagga gacagggact ctcatgtatt gtatgtctcc 17640 
atggactagg cttttggcta gaatttttca taaacattac ctttaaagca gtcttgaagt 17700 
atagggctga ccaccgtttt gtcaacaaaa agactaagat tcaggaaggg taagaaatat 17760 
gttcaaagtt caccaactga cagtttccca aagtgacaga accaggaatc aaaccccatt 17820 
aacttattgt gaggcctgga acctaccaga acccatgacg tggggaaaac ccagcagctt 17880 
gtcgttgcat gcaccaagtt atattatgtt gacaattata ttatttcaac cacgttaagc 17940 
aggcaaactt ggctataaaa tgggttcaca aattttacct gtaatgtaac cgaatgacat 18000 
aaggcatgcc taaacaaaaa gatattcctg ttgtaataaa ttttctttct gtcatggtgg 18060 
agggggaaga ctcatatcag ttgcagatat tgctcagaag tttcaattgt gttattttga 1812 0 
aaaactacat agcagaacac gcatgtcata tacacaaatc catgagcctg tatgactcat 1818 0 
atttcttaaa gataaagaaa aataatatat tcagattttg atttatttga agaaaataat 18240 
!!f C tt tcaccaatag actaataatg ctttgttggc aggtgtactc aaagttctct 18300 
atgtcttgac tgagtaacta grtgacttccg taaggatttt ataacataaa ttgggtaatt 18360 
^°^f^ aC tta ^ a 999 a aaaagcatat aaatgctaga actttctaga tttcatgttt 18420 
tctgttttca aattctcctt taccatatta ttgtagcaac attattatac tcctgtgaac 18480 
tcctttggat ggtagccatc actatataat acctggtaaa aatgttaatt cctcagattt 18540 
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aagaagtaaa attagtcatc tgtttgccaa tttgacataa aattctagtt atttagatct 18600 
ttatattcca gagcctaaat gaacaaaaat acataaattg tctcagaatt tccttttagc X8660 
caaaagattc agggagatgg gcctctagag tttttcacag tttttttttt ttttgtaaaa 18720 
aaaaaaaaaa aaaaaaaaag gagagataac agatcaatat atattagttt caaggttttt 187B0 
tgtttttttt tttaaacaaa aacctgtaat tgcttttcct attttaacag tatttaaaag 1884 0 
tttagttcct caggtaacag aacttgaacc tgtttatatg atcaaagttc aagaaattgg 18900 
gcatgtttaa tttggagaag actcggggac cacaatattg ttgtcttcaa atatttgggc 18960 
tagaggagga aattatttta tgtatgttcc aactggtaga cctaagcctt atggaatggg 19020 
agatataggg agacatattt caactcaaaa tgatgaactc ttaaaagcag agctgaccaa 19080 
agagaaacaa gcctctttag aaaattaaac ttactatctt tttaattact gcactgtcat 19140 
tagagggcca attgtcatgg accctgtaga agtgattcag gtatcaaata tacaattgat 19200 
tagcctaaga aaacatgaag gcttcttcta actctcagag cttgtaattt tgatgatgat 19260 
tttttatatc tgtcattcct agctgctgta acaatccttc aaattaatgg gggaaatgca 1932 0 
ctgaaaacat aatgaaagct agaagaggga acatatgaaa tgaccttggg tcagaatgac 1938 0 
atgagaggat cagcacttga cactctcagc aactgaggga tcattcaggg gaggaagata 19440 
caggtaagac tgaaggacaa ttccaggtgt attctttgaa aatgtacctt tcttttgtgt 19500 
gtcacagtcc agaggaagat ggtgtgaaag tagatgtcat tatggtgttc cagttcccct 19560 
ctactgaaca aagggcagta agagagaaga aaatccaaag catcttaaat cagaagataa 1962 0 
IS^rr 1 ? a ^ c "9cca ataaatgcct catcagttca agttaatggt aaggaggtcc 1968 0 
ccttctatgt gatatgaagt tgtctattag gtccatgttt tgacgaatct caaatttatt 1974 0 
tgtcattatt tccatttcaa ataatagcta gaattcagat gaaaaaattc aagttaaaga 19800 
tgtgacattt caaggtgtat tagtctctaa cgtaagcatg tctgaagtta gtcatccagt 19860 
ggttttcccg acagtaattg attggcactc atcccaaaat ataggcaagc atttacaact 1992 0 
aacagagagt taatcccacc caggcactgc ctccatgact aagcaagtga aaatactagg 19980 
ggtttagcaa taattgtttt tctgggtggg accttcctaa aacacaaatt catgtgttgc 20040 
catactttta ttgatagttt ctatatatgg tgatatacaa tttttgttag ctttttttcc 20100 
tatgggcatt tgggaaaatg gcaagccaac tttgaagttg ttagagtcat tttaccatta 20160 
^J^™ ^f? 0 ^ 0 ta 9Haaaaca tcactgaaac tatgtgtaca ttgttccact 20220 
tttctctttt tttttgttca cccttagccc attataccat tatcacttcc ctcaattaag 20280 
gagaacaaac ctttatcaag gtctatctct atggccttta ccttaagtaa ctaatttctt 20340 
tttatattcc agtgacgtac gcaaattcac ctttatagaa gtgaaattca cacaaaaaga 20400 
gttgaggaat tcagtaatta aaaggagcta agaatcaaat ttaaatctct aatttcttaa 20460 
aaggctccaa ttaaaaaagg tttctatagt caaacacatc ttaaaaattc tggctttgat 20520 
actcgtttct tggaaattct tccttatagt gtcatattaa aaattctaag gcagccagct 20580 
agagagaaac ttgtttaccc tcgtccgcta agctgtttgc acagcatctt cttccaacag 20640 
acaagtatag atttctccta caaatttcaa tggataccag acctaagtgt tacagaagag 20700 
attcagggca agcgattttt atcagacatg aaacaggaca ctctgccctt gtaagggtct 20760 
agctgacact tcaagaggaa accagataag gaagtaaaaa atgtgaggta atggaatggg 20820 
cagatgtttg ctgatgtgag aacgagtcag ctacttaggg aataaagctg aggacctctc 20880 
ccagccagaa gggaggaacc tgacaagtgc ttaatccatc ttctttgtta gatggggaag 20940 
caaatgaata gaagttgtga aacaatgggc attctgataa tttacatgat gctttctgtg 21000 
taatttccaa taaatagtta atttgtcagg aatgtaaaag cctgaactat ctgaaaccag 21060 
tl^f 3 ^ aaa "^ tca ttggctgcct ggtctttttg ttttttgtag gctcagcttc 21120 
^ a ^ aC " Cag cttattttaa taattgtact aaattaaatg gtaggatatg ctaatggaga 21180 
acctgatttg agagtcacct gaggctgggc atggtggctc aagcctataa ttccagcact 21240 
ttgggaggcc gaggcgggtg gatcacctga ggtcaggagt tcaagaccag cctggccaat 21300 
atggtgaaac cccgtctctt ctaaaaatac aaaatattag tcaggcctgg tgacgggcac 21360 
™ M ^° a9ctactt 99 gagactgagg gggaagaatc acttgaaccc gggaggcgga 21420 
ggttgcagtg agccaagatc gcgccactgc actcaagcct gggcttgaca gagcaagact 21480 
ccatctccaa aaaaataaaa aataaaagag ttacctgacc aattctaact ccactaagtc 2154 0 
accacaggac cacccaaata attggctcat gcctttgtct tcattttctc atctgtaaaa 21600 
ttccaatggt aatgtttgtt cttcctgaaa tcacagagag attataacga tatacaagga 21660 
aatagaaaac acaatgtgaa ataaagaggc tgttactaat gagaaaacta ttatgttgtg 21720 
^^^ C ?? aaaCCtga aa tcattaat ttgagtgatt gactagtagc agaaagatag 21780 
atccttgaaa gtttcagaat gttcaatgta gaaagaacag tgtttgttag tgatatggga 2184 0 
SS" tgttgctttt ctggccagaa acctctgtgg ccagtggttg gtgcctt^c 2^oS 
ccaagttttg ctctggccca ctgggcttgt tctgcccact tgacctggca gactgtgccc 21960 
accttccgct accagcctgg atcccatgcc caccaaggcc aacccaggca tggagctgtg 22020 
agggttgtct gagcgagcac agggtctggc cactgcccac agccaggcac actggctgca 22080 
gcatgacggg cagctccagg cactggcaca ggtgtgctgt ctctctgtga ggctgtggct 22140 
ggacaaagct cactgcaagc agcttccctg gcaggcacct gggaatgt^g tggcacccag 22200 
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gaagcttgga gatgccagga actgcagggt cccaaagagg gagtcacaac cctggcttgg 22260 
ggagctccca ggtctgggat ccctaaaggg ctgcagcttt tctctctttt tacccacaat 22320 
gtggccagca aggggtatgt ttcattcctg tttgtgttac agctctttta gtcttgctat 22380 
ttggcaggtc ctgagttctt gtcctgagac caagaagaat gaggtatgca gacaagtgga 22440 
gggtgagcaa gacgaagaaa ggtttactga gcaagagaac agctcacagg agacccacag 22500 
tgggcagctc ctcttcatag ccagggtgtc ccaacaagtg tccagctcct agcaaagagg 22560 
aggccctgga ggtagaagct cctctctgca ggcaggttgt cctgttgagt gttcagcttt 22620 
cagcacacag taggcagtag gccctagagt ggtctatctc ctctctgcag gcaggtagtc 22680 
ccatggtctc ccagtcacct ctccatctgc aagggtccaa tgctgcctcc agcacctctc 22740 
tgcccacccc tccgtgcctg accaagctgc tcccccacca gtgggcaact cagcccagcc 22800 
f^" 9t99t ^? c t«=ccagg gtggcaggct ctggggggct cccagggatg ggctccaagg 22860 
actgtccacc ttchccccac gccctccctg cagtggccat ggtcaagaat ggcaatgtgg 22920 
ggccaggttc cggagcagga gaggctccag gcctgggagc aggtcctgcc tggtcacgtg 22980 
aggttggggg tggcacagtc ggctgcctca gggatgtggg acacagggga cccaccacca 23040 
tcactgctac tcccgcatcc gctcctgcta ccactgctcc agacagcctg tagctgccat 23100 
cactagcact taagaaaggc acattcagtg gacagctcag gaaaatcttt acgtcaattt 23160 
tttataggca aaaacattgt ttcctgggca aacaaaattt atggactacc aataaataga 23220 
^!? ta9a 9 attcta 9*t taagtctaga aataatcctg tagcccaaga tttatttata 23280 
atttgtcaag aatctgtatt ttgttttgac aaaaaaaaaa ctgtgtggtg tgggtccttc 23340 
aggagacaca gtgtgacaaa gcaaagctaa aatcaacttc tttgcattgc aaacaccaag 23400 
gctgtagtca agcagctcac tgcctatgtg tcagatgact ttgcttcatt tttcatcatg 23460 
atacttgtag tctatagagc cctgaatatt aactagcttt ctcccaactc agaaocgtgt 23520 
taggaggtgg ttgctttcaa aactaaagtg ttaatgttta tttccatttc tataccagga 23580 
^ a ?^ aaat cttt 93 fccaa aattagaaat ctttaacaac tagttacttg tgtattgaoa 23640 
gtttgtttcc aggtgtaatc attctccctt aaaatccggt tatattcacg accattatac 23700 
Z„llt C 3 ft atcattcct 9 gaaatggcta acttgcatcc tgctcagact aagttgacaa 23760 
agcttcaatt gaagaattct aactttatgc tattttccac tttattgcat tacaaaggac 23820 
aaaatatata gttttcttaa aaatgaaata aatttactgc cttaaactac atttgacggt 23880 
aaactgagtt ccttccatag aataaccact aacagcaatc gatggtcctg agcaattgac 23940 
tcttcaccat acaatgattt gggatgcctt taagggtata tttgaattga atattttoaa 24000 
aagctcccac tttgtagagt ttatcatoac tagtttcccc agtggaattt gtagaaagtt 24060 
agtagaatga aacaatctta ttttgtataa tgaggaatag aatactgaga atgtgtctga 24120 
gaaacatggc actggtagga aaaagtaaac agtttattct catctgctca ataagctaag 24180 
tcattttaac ttgaaaatca tcaaaatttt catgaaacct tccaccaact Ctatttttcc 24240 
ccagctttag taagatataa ttgacaaata aaaattgtat acCgtataca acatgatgct 24300 • 
ttgatacatg tatacaagtt taaatatttg tgtttcctta gtcaaactcc tcactttttt 24360 
ggaagttgac agaatttaat cttggattgt gtccaataac tagcttttac cactattcag 24420 
catattttgg ataagaaaca cataacagtt tattctttaa aaaagcaatt ttactattta 244 80 
ggaactgtgt ttaaaaagca ttttaaatat catttatgca agagttttca aggttttttc 24540 
attctaaacc ctttaaccaa aaaaaaaaaa aaaaagattt atgtgaaatt cgaagtaaat 24600 
agaagagatc aaagcagatc tgttctggct gaggctgagt ttgagacctg taagacagtc 24660 
tacttgccat atggcttggc tgtgtcccca cccaaatctc atctcgaatt gtagccccca 24720 
taattcccac atgttgtgag agggacctgg tgggagataa attaaatcat gggtgoagtt 24780 
tcccccatac tgttctatgg tagtgaatga gatctgatgg ttttataaga ggottcccct 24840 
ttcacttggc tcacattctc tgacttgctt gccaccatgt aagacatgcc ttttgccttc 24900 
ctccatgatt gtgaggcctc cccagccaca tggaactctg agtccattaa acctcttttt 24960 
f=" a Z aaa taccca 9 tct cagatatgtc tttatcagca gtgtgaaaac aaactaatat 25020 
™^ * ctctgtccca tttatccatc ttctgaagtg gaatgcaaag aagctttacc 25080 
ccgaactgct ggaaaaccat agttctctat taatacaaac tatttgtggg ctttagtcat 25140 
ccactatttg tgccttactc accoattgct tgtgatagta tccacctaat tagaggctgc 25200 
ctataagtct ctacaaaaac tgtacacaga tgttgttata tcagatagcc attctcctaa 25260 
ttaatctata tgttcaactg tctagaatcc atatatggtc agtatcctct gattattcct 25320 
ggtcattgag accaaocagg aaaatatcaa attatcacta tttgttttat cttctttttc 25380 
agcaatgagc tcatcaacag gggagttaac tgtccaagca agtaagtcaa gttagcttat 25440 
ataaacaagt tcaattttca catcagaaag gacattttca aatatttgct catacttgcc 25500 
? C * 9 ? T° tttga 9 a 9 at aa taactatt tgtacgatag atttaaatac 25560 
attttttttc taactcatgg actgatcttt tagtcatgtt caagaaaaaa attgccatgg 25620 
'^" C ^ gggcaatttg aagaaagcat ttatttttga ttgggaatat tggacttgtt 25680 
tttctaattt ttaaaaatgc cataaaatgt actttctgct acaaaataaa ataataagaa 25740 
agtaatcaat aggaaggaca taaaacccat tgtctgtgac tgacaatttg tctgtgaaat 25800 
atgctaaggt caggagttcg agaccagcct gaccaacatg gagaagaaaa cccatctcta 25860 
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ttaaaaatac aaaaattagc caggtgcggt ggcaggtgcc tgtagtccca gctacttggq 25920 
aggctgaggc aggagaatca cttgaacctg ggaggcagag gttgcagtga gccaagattg 25980 
caccactgca ctccagcctc agcgacagag tgagactcca tctcaaaaaa gaagaaaaaa 26040 
atatgcttaa tagattcatc ttaatcgcta acagtggctt cattaaatca cttcaaatca 26100 
aattttgaaa gattttacaa aaaacagtga tgaatttgag caatgatgtt 26160 
^ 9 f C !^? t9aC tfc 9 caaacac cctaagtatt tttatocatg tgtttattca 26220 
Z ^ff tcttttaaca tctaccaagt gccagaaatt agaccaggag ttggtggtac 26280 
aaatl^ aaaacatgat -ctgctcta aaattagaat tccaaagfag agaaagltat Itllo 
actat^r 99 aa 9*atga aaataatgtg attaatgcta tgacagagga agtgcatagt 26400 
taaaa^t^ "gatcagag agtcagctaa cctgttctca cacagtaaga aagigaaccc 26460 
tgaaatgtga gagagaagag gccatgaatc cagtgacagg tggggtaagt gtcctqqqca 26520 

acaaqaSt? ££T"T ! 9t f tCa " Caa * taa 9 aa Scat't SSSS Tsll 
acaagatgtt tcttataact taatgatctc atcttttttc aggttgtqqt aaacaartto 
ttccattaaa cgtcaacaga atagcatctg gagtcattgc acccaaggcg gcctggcct? 26700 
ggcaagcttc ccttcagtat gataacatcc atcagtgtgg ggccaccttg attaqtaaca 267«n 
catggcttgt cactgcagca cactgcttcc agaagtL|t ^attgacct? aagtSqaac It all 
tatSctqaa ^T 00 t9a9tttt ^ catattcttg gtaacaatta atgtctc^aa 26880 
ttoe a ^ aa gtaaaataa 9 aaaaagttat ttcaggttct tttctaaaat aatgttacac 26940 
aatcatttcc tcaaa^ttt " gat "9 aa taagtaacag tcattatcct agtatccatc 27000 
1*11111 tca aagtttt taataaggaa actgtgtaaa gaaatcagaa ctattttgtg 27060 

tctccactta Ca ^ aataa ° afc 9 taccatt aatcttttgt caaacaa^ 27120 

taaat a !t~ ™ ft 9 ^ ctgtttctgc caaacacttg ggccagtctc atactgatct 27180 
aaccqtqatq ,MB ? ttt 9gaaattttc aataaatgcc ggaagttggt 27240 

9 a 5? ga9aact 3<=agatcaaa tttagagcat tgacatatga agatctgtgg 27300 
aatcagaaca gtttacaacc aaaatgagag attgctagca tgataaagac aggcacttcf 27360 
"tcccata^ CtC " a f tat -aaaggattc atagaggccc tcgggcclct caltgtgacc 274 to 
ttcccataat agagcatctc ttcacaatag tgacacaaaa gacaaagctg aagtgaaqaa 27480 

£££££ ^T 0 ^ taattgtttc tgaatgcata cattttatta JESg£ 27540 
a aaa t?a~t tUt ttaatcttac ttttcaagat aataaccagt catttttLc 27600 

ttcattoc^ ^r 9 ^ " a 9 attt 9 ttt ctaagtagat taactgtatc gcctttcttc 27660 
ttcattgcca attattacag taataacaaa gacttcttga gtatctctat ataataggtq 27720 

aaccttaota "T* -"^gtcc caggcagttg gagagctggg caaattlttg 27780 

aattctaacf * agataggCt a 9atcttttc acattctttt tgacctataa 27840 

ractctt^ t " 9ttacta taataaattt catttgccta ggagcataaa tctttataga 27900 
gcctSaatc atatacata * taagaatcta ggcttggcat ggtggctcat 27960 

aara—™ ccagcatttt gggaggccga ggcaagagga coacttgagc tcaggagttc 28020 
ccaqctactt ST <= a «993cat ggtggtgcat accStcatc 28080 

c^aqctat™ 9ggaggctaa <=9<=aggagga tcccttaagc ccaggagttt gaggctcctg 28140 
aaaata^C t^attttta 9 ° aC * CCa 9 c ^gagtgaca atgcaagacc ccatcttaaa 28200 
aaaatagtaa tatattttta aaaataatct acataaattc ttaatgtttq aaaaatotaa jnjcn 
gagctcagta agctgatata ttagaaagcc agaaatccct tatgrtggtg tctgq^t 283I0 
SSSZS ? 9aa ^ taC ttt9CCaaa9 ttagccattt ttg?gg?Ig a 52E2S 28380 
ata?t aaa ^ a ~t?t* 9 Z atfc 9 aacacc aaatctatac tctattaact tclaccatca 28440 
acaacfca^ ct 99aacaac aggaaccaat tttatttctt cattcatata 28500 

cca?Ia««a all?* 1*1 Ctttttca 9 a "aaacataa aatgagggag aatatccaaa 28560 
agccctl^ 3 aa ^ aaa ^ a Cattact 9 fc 9 agctttagtt tgctaaggat aatgacctcc 28620 
coc^taat^ S C f 9C aaa 393catg attttgttct ttttatggct gcatagcatt 28680 
tqatto?^ Z5£T? Cattttcttt atccagtcta tcactaatgg gcattlaggt 28740 
aftaa^^ 9 tzTtf t aCCgaa 3agt gctagaggga gaggatcagg aaaaataalt 28800 
ataaq^tac K^"" CCt 999t9at gaaataatat gtacaacaaa accccatgac 28860 
acaagtttac ctgtgtaaca aacctgcaca tgtaaccctg aactttgaaa aaagtatata 28920 
taStatT " tatatatg catacatata tatgtgtgta tatata^gca tatSgtgtg llllo 
tgtgtatata taaaaaaaaa tatatatata tatatatata Cataattacc tcatttttcc 29040 
agaaccaact tccagatgcc ctaccacatt ggttcttatt ctctgaacat tcgagacttt 29?o2 

SS2S 22=2 S52= — J s— ™ 
2S=S SSSS StSSf* S3E= ™S I f 

S= S=52 255=5 =25 ™ 

aaaggactat gcagtcatct aatagacttt accacatcca ttcttg.ce? tcaagaatct 2952? 
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actccccaga aagaacaaac atgtttttta aaaatgtaaa tgagactaca ttattctctg 2958 0 
gcttaattat ccagtagatt cccatatcac ttcaataaaa tttaagcact ttatcatgac 2964 0 
ctataaaaca ctctaaaatc tagtccctgc ttacctctcc aagctcaccc ccaaccattc 2970 0 
tttcccttgt gttctgactg cagcccatcc aacccaagac cttgggattt ttgcctggaa 29760 
acttgtttcc ctcatctcct cacactgacc ctcttttact atgtcttagc ccaaatgcgt 29820 
tatcaaaata atcataatga cctgttagta ctctattccg ttaccctatt ttattttgtt 29880 
catagccttt atcaatgttt aagattattt atctatttgt ttgcttgctt tgatcctttt 29940 
ccttctctgg aatcttatac tcctgtgagc aggcacctta ggtcctgttc atcactttat 30000 
ccccagcagt tcagataagg ctcagcacac agatgctcag taaatatttg tggaagggat 3 0 060 
aaatgaatga tattttatgt gtattacagt tctaaaattc aatagttttg tattaaatat 30120 
cagttctaat atggcattta tatgatttta tctttcaaaa cattagcaat agattatatt 30180 
taaatgataa aagaaaacta taactgcagc caagtattct caggattgta tttctcttat 30240 
attagcctaa atgcaattaa tctagctcat atactttggg cagcttatat atattctgtt 30300 
aatttctaac cttttccagg tataaaaatc cacatcaatg gactgttagt tttggaacaa 30360 
aaatcaaccc tcccttaatg aaaagaaatg tcagaagatt tattatccat gagaagtacc 30420 
gctctgcagc aagagagtac gacattgctg ttgtgcaggt ctcttccaga gtcacctttt 30480 
cggatgacat acgccagatt tgtttgccag aagcctctgc atccttccaa ccaaatttga 3 054 0 
ctgtccacat cacaggattt ggagcacttt actatggtgg tgggtatctc aggatagcta 3 0600 
acagagcgct aagccctgtc taaggcaatg tgatttcatc tccatcaata ttatcctgac 30660 
agccatttcc acacagtctg gttggattag • ttagggttct tactttgtgt gacagaaatt 30720 
caattcacat taaccagtgc agaataaaaa acaaagaaac aaaaacttcc acaaatttgg 30780 
ctcatgtaat ttggaagtca aaaaagtgta gtaagtttca cttcagacac aggggtttat 30840 
atgatgtcat ctggctctgt gtctctgaat ttgaattttt tgccccttct tttctctatg 30900 
ttggcttcat tcagagggat gctagcttca cctagtgtca gaggtggcta acaacacctc 30960 
aacacatcat cctcaacaaa gaaaaaatac atagaaagga afcatttattt cttttctttg 31020 
ccagaattca cattaatttc tatfcgttcca gctgtgtcta ggaggactca gattgagtgg 31080 
ctaactcaaa tattctttat gcctatgtag caaaatttgc ttcagtactg aagaagctaa 31140 
tttaagtgtg atggtgaata agaatagtgt agagataaat tgtcaaacta tttgtcccct 31200 
ctaaaagtat tcaacttgat atactaactt agtcttgtaa gaaataatga tgatttagtt 31260 
actgaatgtt ctaggcaatc ttagfcgagac acgctctgga ttctaacatg tggtccaggt 31320 
acatatgtat aacaaagcta gaaagtttct ttaacactgg gcttgagaaa atgcaaaagg 31380 
gctttctgag aatgactaaa tctatttgca ggattctata caatttattt acatacaaga 31440 
aattataaag aataagcttt tgafcfccfccag tctaccatta aggaactagg aataaccttt 31500 
cactcacata ggcaggaatc ggttttaggg tctctagatt ttttccagat gtcccatgtg 31560 
gttttgtttt atcttataca gagtgagaca tgcattgctt tctttaaggt tgtattacca 31620 
atcacagaaa atattaccta tggtttatta attctagtag atccagtgct gctgtaagcc 31680 
tgacacctcc ctaggtctgc actctcttgg atggattttc tctgaagata gggcttgcat 3174 0 
tctctgcttc atagtggtgg gaaagacatc acaaatcccc tttggcttgg tgggaaaaat 31800 
cactttcagg agtttgagac tggcacagaa acatacctgt cataatgcgc tgtgagtggc 31860 
aacagaatct gacacttata gagcactcca ccctacttga acacggcctc tcttggtgag 31920 
tgacccacag gtgcttttaa tctattaaat agattaaatt aacctatcat tcttaatctg 31980 
ttaagtacat taatagatta aaagcagcca ttcgttactc accaagagag gctatattca 32040 
agtctgtaaa gcaaacctta agaagttttt taaaattgaa attgtacaaa gtatattctc 32100 
tgatcataat ggaatctaac tagacatcag taacagaaag ataacataaa aatccccaaa 32160 
tgcttaccaa ttaaaaaaca tatgtaaata aagagaatat ctcgaagaaa tttgtaaaaa 32220 
caaatagaac taaatgaaaa caaaaatata taaatatatg ccagatgctg ctaaaatagt 322 80 
gtagaaaggg aaatttatag aaaatgcata ttataaggaa agatatcaaa tcaataatta 32340 
agttctcact tcaagaaact agaaaaataa aaaataaacc taaaacaaac ataaggaagg 324 00 
aaataataag aataagaata gaaatgaata aaattaaaaa taaactatag aaaattgata 32460 
aataaaaagc tgattatttg ataaaatcaa tattttgcta gaaatgtcat taagcatttt 32520 
tacagaagat gagatatagc tcagggatgt ccagaattta tgggctatgc ttttcatgac 32580 
ttggaataca ttttaccaac cagtttagtt tgctgaagaa gttgtggatt tgcactgtca 32640 
cctacttaca atacttagat tgtcagtttc accttactct tctcaccatt attttatttt 32700 
tatttttatt tttattttta ttttgaaaca gagtctcgct ctgtctccca ggctggagtg 32760 
cagtggcgtg atctcggctc actgcaaact ccgcctcccg ggttcacgcc attctcctgc 32820 
ctcagcctcc cgagtagctg ggactgcagg cgcccaccac catgcccggc taattgtttt 32880 
gtagttttag taaagaaggg gtttcaccgt gttagccagg atggttttga tctcctgacc 32940 
tcgtgatcca cctgcctcgg cctcccaaag tgctgggatt acaggcgtga gccaccgcgc 33000 
gccaggccat gaatgttttt aattgatgat atagtaggca atataaatgt gtgtgtgtgt 33060 
gtgtgtgtgt gtgtataata tatataaacc aattgtattc aaataacaga ataatttgaa 33120 
aaatctctta gcatatttct gagttacaca cttaaatctt ccgagcactt ttaaatatgt 33180 
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gtttacaaac atttcttcag aaataaatct tggaaatcgt cttctaaaga aactggtgta 33240 
ttagggtttt ttcaaatgta cttagttttt tttttaattg atgtataaaa ttgcatgtac 33300 
ttaccatgtg caacataatg tgttgaagta tagtatatgt acactgtgag tgttaaatct 33360 
agttaactaa gaagcgtctt attttacata attatcattt ttgtggcaag aacacttaat 33420 
atctactctt gtagcgtttc tcaagaatac gatatatcaa cagtaggcaa ccagaagctQ 33480 
ggggtcttta caggggaagg agttagggag atgctggtca acaaattcat atttgcaott 33540 
aggaagaaaa agttcaagag atctctcatc catcatggtg actatagctg atgatatatc nsnn 
gtattcttgt attagttttt tataaatgtg taacaaataa tcacaaacag ttfaaacagc llleo 
actcatttat ttttatctca ctgttttcat gagtcagacg ttcagacaca gottagttaa ""o 
gtcctcttct cagggtctca ccaaactgta atcaaggtgt cagctggggt tgtggccaca 33780 
J22S° ^ f*"* ctcctcaa 99 tttgctggca gaattcc??t aSgcagct 33840 
gtagaatgca tgccagcttg ctgctttaac tctttaggaa agtgtctcaa ctccagcaaa llloc, 
gctcgccctt tttgaaatgg ctoagctgat taggtcaggc ccacctttga taatc^cctt 33960 
ttgatgaatt caaagtcaaa ctcattagag gtcttaatcg catctgtaaa attccctcat 34020 
^? 9 ^ ata taacataacc taatcatgag aatggcatcc ctcatattca cagatcctgc 34080 
ccatatttgg gaggagggga atcacacagg aatcttgggg actatcctag aattctgcoa 34l!o 
accatggggt catggtttcc caatcaatat atggtttggt ataaagaatc cctgaatgct 34200 
taactt™ " a9ttttCt acgtagcctg ccataataat ggtttctaaa actcagaacc 34260 
«t^^ a ? t gCag ^ Ca <=caacttgta atacattgga agtgaaatca ttgccgttta 34320 
*V? C **tt*t atatatatga tgtataatat afcgtatattt cacatatatc ttatatatgt 343 80 
? aaa o™^ Cataaacttt aaataataaa ataaatgtac atagtattat aggcatttL 34440 
tcaagccaat ggagaaaacc atctaggcat gcagagtttc tgggaacaat ctggaaccca 34500 
aa a at aaaa ? *" taCaaaa ^taaaaggc cttcctgaaa tatataagct ga^attttt 3« 60 
aaggttagat tttaccagga aaaagaatcc aaatggcttt cttgctttga gaagttttta 34620 

tt aaa ™f a v t9 ? aC f a 5 aattatc 9 tt agatgtgcca gatttaacca gaaattcttt 34680 
tttctagaaa ctgcttatat taacttoatt ctgtattgac aattttacca tgaaaaaaat 34740 
ctcta™? tC " CtCaCt tcactctago caaagatgct: gattgtaaat altagaataa 34800 
'^aal, ^" aa " 93 aatcccaaaa tgatctccga gaagccagag tgaaaatcat 34860 
aagtgacgat gtctgcaagc aaccacaggt gtatggcaat gatataaaac ctggaatgtt 34920 
ta?a?S a ^ at3gaag 9aat tt atga tgcctgcagg gtaagttgga gggatttltt 34980 
™» actcaaaaat ttgtatctgg cttagaatat attatatgtt ctttacataa 3504 0 

ggacaaaaca tagatatcat gtcagctcaa aaaagttaca aatgcaaatt tcacagcaca 35100 
aaatactttt aaatgtttta ttaagataaa tgaagtaaga gtttctctga tgctatcaaa 35160 
caaacaaaat tagaatttct taaccagaaa tccaaagatt aataaagclg tltattttct 35220 
taSt!!^^ acattcaaga aagaaaataa tcataaacag agaagtataa agtgatgtta 35280 
^ 9aa ! a ^ aat 9aaaagc aaatattttt cttgaaggaa acatttttgg aacaagtatc 35340 
? gacgtaaata aggcctgaag aataaataac atccaatttc agaataagaa 35400 

tS" 3 tagaaaagac aaaaagcata gccaaaatta tgaaggtgtg aaattacaat 35460 
tcatafcctga gggaactcca agtaattggt tgggtctcag catgaggagg atgagaagag 35520 
aaacaagtag ataaccatga gaaggtggat taggccatgt tgtgatLca tggglcctc 9 lst B0 

gagSctto f aacat99a ^tttccag cgaaggtacg t ??c tt cctg Isslo 

gagacacttg ctttttaaca tgagatactt tagaactcta aggaggccac tctatgtgga 35700 
99aa tggtattgat atcaggtggc agaaagtcct gtccagagtc ccacaaactg lllll 
taccacatgt gcgacctcta tcagaaaagg agcagggacc tatgtgacat agaggctggl 35820 
at ^ 99tCCa -Sccagoct cggttgctaa taatgtggag ggaggcagg^ 3SBB0 
agaatttagg gattccaaca aaaggtccat aocacgggga acaggtggaa ggtgcaggag 35940 
oaggal 39 " ga « ggaCcg 93gaa tt cag gtgaaccatg acatLctga alagcc??" 9 36000 
S 99 ™" tgg *f ata 9 a Satgcttcac tggattgggg agcagaggta aacttgctgc 36060 
ctaactgtgc aaagtaagtg ataaaaoaag gctttagtca tagaaaaata cagtaagtta 36120 
gg«caggta caaggatcca agacaggaat acagtgattg taattg^gc Itllo 

tctccEao fftTatT, £* tae "« gaagtgcaag caccaccaac acctcg^tt 36240 
cctccataag tctttctctc cagagccctc atgacctaat cacctcttct taagtcccat 36300 
ctctcaacac tattgtattg gagattaagt ttccccaacc tatgaactct tgggctcaca Itllo 
atac^f a9CaCCaCC = a 9<= a caaaag cacagagctt ccaatctggt ttcLgctcc 
a tttt^ a I a a ° CaaaCa9t aa 9aatcacc tctggaaatg tagcaataat ataatcataa 36480 
tttttaaaat ccagtggaag gattggaaga taaaatcaag gaaatctctc agaaagaaca 36540 
acaacaacaa aaaagacaca gaggagaaaa ataatoagaa aaattaagaa aLtagagga 36600 
acalt^! 9 agatCCaaCa =<= a aatgaat aggagctctg aaaacataaa acgcgagtg^ 3S660 
atlllnlt* aaaaM ^ aaa Saatgctcct agttctgaag cttacatgca tcctattgC 36720 
cataafoa^ aagtagtgCt 999<=acaata aatgaagtac ttctttooaa gacataccat 36780 
cataaagggt cagaagccag ggataaggag aacaatctta aaactttgaa ggaagaacca 36840 
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tcagaactac atagaactcc tcaacagtaa ctctagaagg tagacgatgg tggaaaacac 36900 
attcaaattt caaagggaag attatttcaa cctagattcc tacccatgct aactaaatat 36960 
caactgtgag ggtggaatta agaagtttag acaagcaatg actgaaaaaa atgtacttct 37020 
gataccctac ttcttaggaa actacttgag agggtacctc agcaaaatga gggaataaat 37080 
caagaaagtg gaagacgtaa gacctgaaac tgttagtcca acactaaaga gtggtatcag 37140 
ataatcccaa caccatagct ctgcacoagg cttaaagtaa ccagctcgaa tttgagcaga 37200 
^?!f!?! aa f 9T att gtgt3t atgtgtatgt gtatgtgtgt atgtgtgtgt gtgtgtgtgt 37260 
gtgtgttgat atggtggaac agottoagag gaagtaaaag aactaacaag ctatctgatg 37320 
tccttgaaca ttagtaaaca ttattgtgag gtgttggtag atcttttgga gcattcagca 37380 
tttaccaggt acatagaaaa ctatccacat gaaaaaaaga gttgtgttat taattctagc, 37440 

tacln aaa ^ a ^ attt f I taatccaaat *tgttaettg actcttcaat taataaaatt 37500 
tactaaat 9t aggctgttaa tttaaccaaa aatagagatg ctataatgta 37560 
aagatgtggt gtggaaaagt tgcaaagaag ttgtaaaaca actaaatccc taactacgta 37620 
agagaaaata aatatttact gtctaaacct agaagctgta atttgagcat attatctagt 3768? 
gataaggagt tagatactat aagaaatcat taaacaagca tgaagtggct acctcttgga 37740 
gaacagcttg cgtgaggtaa catgggacat aactgctttt caagcctctt catgtttttt 37800 
cgtttttgcc ttttttaact aagtgctgtt tactctaaca aaaLaattt tattttttaa 37B60 
tct^ttE ^ 9aaC ^" aa SSCtctttgt aatattaaaa tccatgtctc aattaattat 37920 
tctgtgttga tagtctatac atgtactgtc tagtaacaaa atatgtgatt oatcaaaata 37980 
tgagctttat gtttagctaa ttttctttct tttttcttat gtttttattt 38040 
too^S tct 99999 ac <*ttagtcac aagggatctg aaagatacgt ggtatctoat 38100 
«f? a £T* agctggggag ataactgtgg tcaaaaggac aagcotggag tctacacaca 38160 
agtgacttat taccgaaaot ggattgcttc aaaaacaggc atctaattca cgataaaagt 38220 
taaacaaaga aagctgtatg caggtcatat atgcatgaga attcaactat ttagtgggtg 38280 
tagtacaaca aagtgatatf aaattactgg atctagtaac atgaaacaca caacgilag? 38340 
£S t S P *J ««"aatc aaccaataat ccttagccaa tttataaggg acfctitatct 38400 
a^ 33 ^^ ^ ^" Ct fc ? aaaaatac 99*W*a cttagctctt taaatcacga 38460 
aa^^r aCCa 9 fc 9 a 9 a ctoaatac " atttttgaag atagtccatg ggatttttag 38520 
aatgtcgttg teaagggtct ccttttaact gagaaacttt ttgaactcac aaagtgttca 38580 
a ^ aaaC °f tt 9* ataattcc ctacatttct ctcgagctca caaatacttt ttfcttctttt 38640 
tccttattca atcagatttt ccaaagtacc tttccaccat aagaaatgaa ttttctactt 3B700 
ctacacccat ttgagagaca ccaataaaag aaagtcatat gtaggaaaca aagtctgata 38760 
gtaaaacaag ccagagatct tctaactttt tttagttata aaacctctaa tttttggtga 38820 
cttttctaca cacacacaca cata . 

39844- 



<210> 4 
<211> 407 
<212> PRT 

<213> Hunan 
<400> 4 

Glu Pro Trp Val lie Gly Leu Val lie Phe lie Ser Leu lie Val Leu 

_r 5 10 is 

Ala Val Cys lie Gly Leu Thr Val His Tyr Val Arg Tyr Asn Gin Lys 

20 25 30 

Lys Thr Tyr Asn Tyr Tyr Ser Thr Leu Ser Phe Thr Thr Asp Lys Leu 

35 40 45 

Tyr Ala Qlu Phe Gly Arg Glu Ala Ser Asn Asn Phe Thr Glu Met Ser 
„- 50 55 go 

Gin Arg Leu Glu Ser Met Val Lys Asn Ala Phe Tyr Lys Ser Pro Leu 

70 75 * AO 

Arg Glu Glu Phe Val Lys Ser Gin Val lie Lys Phe Ser Gin Gin Lys 

8 5 90 95 

His Gly Val Leu Ala His Met Leu Leu lie Cys Arg Phe His Ser Thr 

100 105 ~ 110 

Glu Asp Pro Glu Thr Val Asp Lys lie Val Gin Leu Val Leu His Glu 

115 120 125 

Lys Leu Gin Asp Ala Val Gly Pro Pro Lys Val Asp Pro His Ser Val 
130 135 



140 
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Lys He Lys Lys He Asn Lys Thr Glu Thr Asp Ser Tyr Leu Asn His 

i 150 155 i 6 o 

Cys Cys Gly Thr Arg Arg Ser Lys Thr Leu Gly Gin Ser Leu Arg He 

165 170 175 

Val Gly Gly Thr Glu Val Glu Glu Gly Glu Trp Pro Trp Gin Ala Ser 

180 185 190 

Leu Gin Trp Asp Gly Ser His Arg Cys Gly Ala Thr Leu He Asn Ala 

195 200 205 

Thr Trp Leu Val Ser Ala Ala His Cys Phe Thr Thr Tyr Lys Asn Pro 

215 220 
Ala Arg Tip Thr Ala Ser Phe Gly Val Thr He Lys Pro Ser Lys Met 

230 235 240 

Lys Arg Gly Leu Arg Arg He He Val His Glu Ly 9 Tyr Lys His Pro 
_ * 245 250 255 

Ser Hxs Asp Tyr Asp He Ser Leu Ala Glu Leu Ser Ser Pro Val Pro 

260 265 270 

Tyr Thr Asn Ala Val His Arg Val Cys Leu Pro Asp Ala Ser Tyr Glu 

280 285 
Phe Gin Pro Gly Asp Val Met Phe Val Thr Gly Phe Gly Ala Leu Lys 

295 300 
Asn Asp Gly Tyr Ser Gin Asn His Leu Arg Glu Ala Gin Val Thr Leu 

310 315 
He Asp Ala Thr Thr Cys Asn Glu Pro Gin Ala Tyr Asn Asp Ala lie 
_ 325 330 335 

Thr Pro Arg Met Leu Cys Ala Gly Ser Leu Glu Gly Lys Thr Asp Ala 

340 345 350 

Cys Gin Gly Asp Ser Gly Gly Pro Leu Val Ser Ser Asp Ala Arg Asp 

360 365 
He Trp Tyr Leu Ala Gly He Val Ser Trp Gly Asp Glu Cys Ala Lys 
^ , 37 * 380 

Pro Asn Lys Pro Gly Val Tyr Thr Arg Val Thr Ala Leu Arg Asp Trp 

He Thr Ser Lys Thr Gly He 395 400 

405 
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